UI Console & Example Service Flow
How the UI Console and Example Service work together to browse scenarios, compile execution requests, bootstrap agents, and stream live coordination results
Status: Non-normative (explanatory).
Complements: MACP End-to-End Flow — for protocol-level details on session creation, mode dispatch, policy evaluation, and replay.
Imagine you are an operations analyst at a fintech company. A $2,400 purchase just came in from a brand-new device — the device trust score is a worrying 0.18 out of 1.0. The account is only two weeks old, but the customer has VIP status. Oh, and there is one prior chargeback on file. Should the transaction go through? Should it be blocked outright? Or should the system escalate to step-up verification?
This is not a question any single system can answer well on its own. It requires fraud expertise, growth strategy, compliance checks, and risk coordination — all working together, in real time, under governance rules that ensure no single voice dominates the outcome.
This document walks you through exactly how MACP makes that happen, from the moment an operator browses a scenario catalog in the UI Console, through agent bootstrapping and execution in the Example Service, all the way to live-streamed coordination results appearing in the browser. We will use the Fraud: High-Value New Device scenario as our protagonist throughout — a single concrete story that illuminates every layer of the system.
The Three Services (and Why No Single One Does Everything)
The journey from "I want to run a scenario" to "here's what the agents decided" crosses three services, each with a distinct job. No single service tries to do everything — that is by design.
The UI Console is the storefront where operators browse and launch. The Example Service is the factory floor where scenarios become executable coordination requests and agents get spun up. And the Control Plane is the nervous system that connects everything to the runtime and streams results back in real time.
Here is how they fit together:
flowchart TB
subgraph Browser["UI Console — Next.js"]
Pages["Pages\nScenarios · Launch · Live Run · Replay"]
SSE["SSE Client\nuseLiveRun hook"]
Proxy["API Proxy\n/api/proxy/{service}/{path}"]
end
subgraph ES["Example Service — NestJS"]
Catalog["Scenario Catalog\nfile-based YAML registry"]
Compiler["Compiler\nscenario + template → ExecutionRequest"]
Hosting["Agent Hosting\nframework adapters · process supervisor"]
Agents["Spawned Agents\nLangGraph · LangChain · CrewAI · Custom"]
end
subgraph CP["Control Plane — NestJS"]
RunAPI["Run API\nPOST /runs · GET /runs/:id/stream"]
Executor["Run Executor\nsession lifecycle"]
Projections["Projection Engine\nevent → read model"]
end
subgraph RT["MACP Runtime — Rust"]
Kernel["Coordination Kernel"]
end
Pages --> Proxy
SSE --> Proxy
Proxy -->|"HTTP"| Catalog
Proxy -->|"HTTP"| Compiler
Proxy -->|"HTTP"| RunAPI
Proxy -->|"SSE"| RunAPI
Hosting --> Agents
Agents -->|"gRPC"| Kernel
Executor -->|"gRPC bidirectional"| Kernel
Compiler --> Hosting
style Browser fill:#1a1a2e,stroke:#4a9eff
style ES fill:#1a1a2e,stroke:#10b981
style CP fill:#1a1a2e,stroke:#f59e0b
style RT fill:#1a1a2e,stroke:#9f7aeaNotice how every call from the browser goes through a single API proxy route (/api/proxy/{service}/{path}). This is not just architectural tidiness — it means the UI Console never talks directly to backend services, which keeps authentication centralized and makes the whole thing deployable behind a single domain.
Here is what each service actually owns:
| Service | Responsibility |
|---|---|
| UI Console | Scenario browsing, run configuration, live visualization, replay, export. All API calls go through a Next.js proxy route (/api/proxy/{service}/{path}) that adds authentication headers. |
| Example Service | Scenario catalog (file-based YAML), input validation, template compilation into ExecutionRequest, agent resolution and process hosting. Does NOT embed the runtime. |
| Control Plane | Run lifecycle (queued → running → completed), runtime session management, event normalization, projection building, SSE streaming to UI. See E2E Flow § 4 for details. |
| MACP Runtime | Protocol enforcement, session state, mode dispatch, policy evaluation. See E2E Flow § 5–8 for details. |
The Scenario Catalog: YAML Files All the Way Down
The scenario catalog is deliberately low-tech — it is just YAML files on disk, organized in a hierarchy that mirrors how domain teams think. At the top level, packs group scenarios by business domain: fraud, lending, claims. Inside each pack, individual scenarios describe specific coordination situations. Each scenario can have multiple versions (because requirements evolve) and multiple templates (because the same scenario might run under different governance policies).
Why YAML files instead of a database? Because scenarios are authored by domain experts alongside their agent code, version-controlled in git, and reviewed in pull requests. The file system is the source of truth. The Example Service simply reads it on startup.
flowchart TB
Packs["packs/"] --> FraudPack["fraud/\npack.yaml"]
Packs --> LendingPack["lending/\npack.yaml"]
Packs --> ClaimsPack["claims/\npack.yaml"]
FraudPack --> FraudScenarios["scenarios/"]
FraudScenarios --> HVND["high-value-new-device/"]
HVND --> V1["1.0.0/"]
V1 --> ScenarioYAML["scenario.yaml"]
V1 --> Templates["templates/"]
Templates --> Default["default.yaml"]
Templates --> MajorityVeto["majority-veto.yaml"]
Templates --> Unanimous["unanimous.yaml"]
Templates --> StrictRisk["strict-risk.yaml"]What a pack looks like
A pack is just a thin wrapper — a slug, a human name, a description, and some tags for filtering. The fraud pack, for example:
# packs/fraud/pack.yaml
apiVersion: scenarios.macp.dev/v1
kind: ScenarioPack
metadata:
slug: fraud
name: Fraud
description: Fraud and risk decisioning demos
tags: [fraud, risk, growth, demo]The scenario version: where things get interesting
Each scenario version is a complete, self-contained definition. This is where the real design thinking lives. Let's look at our fraud scenario — the one we will trace through the entire system:
# packs/fraud/scenarios/high-value-new-device/1.0.0/scenario.yaml
apiVersion: scenarios.macp.dev/v1
kind: ScenarioVersion
metadata:
pack: fraud
scenario: high-value-new-device
version: 1.0.0
name: High Value Purchase From New Device
summary: >
Fraud, Growth, Compliance, and Risk agents discuss
a transaction and produce a decision.
tags: [fraud, growth, compliance, risk, demo]
spec:
runtime:
kind: rust
version: v1
inputs:
schema:
type: object
properties:
transactionAmount:
type: number
default: 2400
minimum: 1
deviceTrustScore:
type: number
default: 0.18
minimum: 0
maximum: 1
# ... accountAgeDays, isVipCustomer, priorChargebacks
required:
- transactionAmount
- deviceTrustScore
- accountAgeDays
- isVipCustomer
- priorChargebacks
launch:
modeName: macp.mode.decision.v1
modeVersion: 1.0.0
configurationVersion: config.default
policyVersion: policy.default
ttlMs: 300000
initiatorParticipantId: risk-agent
participants:
- id: fraud-agent
role: fraud
agentRef: fraud-agent
- id: growth-agent
role: growth
agentRef: growth-agent
- id: compliance-agent
role: compliance
agentRef: compliance-agent
- id: risk-agent
role: risk
agentRef: risk-agent
contextTemplate:
customerId: "{{ inputs.customerId }}"
transactionAmount: "{{ inputs.transactionAmount }}"
deviceTrustScore: "{{ inputs.deviceTrustScore }}"
# ... remaining fields
kickoffTemplate:
- from: risk-agent
to: [fraud-agent, growth-agent, compliance-agent]
kind: proposal
messageType: Proposal
payloadEnvelope:
encoding: proto
proto:
typeName: macp.modes.decision.v1.ProposalPayload
value:
proposal_id: "{{ inputs.customerId }}-initial-review"
option: evaluate_transaction
rationale: >
Decide whether to approve, step_up,
or decline the transaction.
outputs:
expectedDecisionKinds: [approve, step_up, decline]
expectedSignals:
- suspicious_device
- chargeback_history
- vip_customerThere is a lot packed in here, so let's unpack the key design decisions. The inputs.schema section is a standard JSON Schema — it tells the UI exactly what form fields to render and how to validate them. The launch section defines the coordination structure: which mode to use (decision mode v1), who participates, who initiates, and what the kickoff message looks like. And here is the clever part: the contextTemplate and kickoffTemplate use double-brace template variables ({{ inputs.transactionAmount }}) that get substituted at compile time with the user's actual values. The scenario author defines the shape of the coordination; the operator fills in the specifics.
Templates: same scenario, different rules
Here is where MACP's separation of concerns really shines. The same fraud scenario can run under completely different governance policies just by swapping templates. The majority-veto template, for instance, changes the policy to require a simple majority vote with a blocking-objection veto:
# templates/majority-veto.yaml
apiVersion: scenarios.macp.dev/v1
kind: ScenarioTemplate
metadata:
scenarioVersion: fraud/high-value-new-device@1.0.0
slug: majority-veto
name: Majority Vote with Veto
spec:
defaults:
transactionAmount: 2400
deviceTrustScore: 0.18
accountAgeDays: 14
isVipCustomer: true
priorChargebacks: 1
overrides:
launch:
policyVersion: policy.fraud.majority-veto
policyHints:
type: majority
description: >
Simple majority vote with blocking-objection
veto and 2-voter quorum
threshold: 0.5
vetoEnabled: true
vetoThreshold: 1The template provides sensible defaults for the inputs (so the operator does not have to fill in every field) and overrides the policy version. Same agents, same scenario structure, but the governance changes completely. You could also have a unanimous.yaml template that requires all agents to agree, or a strict-risk.yaml that gives the risk agent unilateral authority. The coordination logic stays the same; only the rules change.
How the UI discovers all of this
The UI Console fetches the catalog through two simple endpoints:
| Endpoint | Returns |
|---|---|
GET /packs | All pack summaries (slug, name, description, tags, scenario count) |
GET /packs/{packSlug}/scenarios | All scenario summaries in a pack (slug, version, name, summary, tags, template list) |
This gives the UI everything it needs to render a browsable catalog — pack cards with scenario counts, drill-down into individual scenarios, and template pickers. No complex query language, no GraphQL schema to maintain. Just two endpoints that return the file tree as structured data.
Configuring a Launch: From Browsing to "Run This"
So the operator has found the fraud scenario and picked the majority-veto template. What happens next? The UI needs to show a configuration form — but it cannot just hardcode one. Different scenarios have different inputs, different defaults, different participants. The form has to be generated dynamically.
This is where the launch schema comes in. When the operator selects a scenario and template, the UI fetches a complete description of everything that can be configured:
sequenceDiagram
participant UI as UI Console
participant Proxy as API Proxy
participant ES as Example Service
UI->>Proxy: GET /packs/fraud/scenarios/high-value-new-device/<br/>versions/1.0.0/launch-schema?template=majority-veto
Proxy->>ES: Forward request
ES->>ES: Load scenario.yaml + majority-veto.yaml
ES->>ES: Merge defaults and overrides
ES-->>Proxy: LaunchSchemaResponse
Proxy-->>UI: LaunchSchemaResponse
Note over UI: Render dynamic form<br/>from formSchema + defaultsThe LaunchSchemaResponse: everything the UI needs in one shot
The response is designed so the UI can render a complete, functional configuration page without any additional API calls:
interface LaunchSchemaResponse {
scenarioRef: string; // "fraud/high-value-new-device@1.0.0"
templateId?: string; // "majority-veto"
formSchema: Record<string, unknown>; // JSON Schema for input form
defaults: Record<string, unknown>; // Pre-filled values
participants: Array<{
id: string; role: string; agentRef: string;
}>;
agents: ExampleAgentSummary[]; // Framework, role, capabilities
runtime: { kind: string; version?: string };
launchSummary: {
modeName: string; // "macp.mode.decision.v1"
modeVersion: string; // "1.0.0"
configurationVersion: string; // "config.default"
policyVersion?: string; // "policy.fraud.majority-veto"
policyHints?: { type?, threshold?, vetoEnabled?, ... };
ttlMs: number; // 300000
initiatorParticipantId?: string; // "risk-agent"
};
expectedDecisionKinds?: string[]; // ["approve", "step_up", "decline"]
}Notice how the formSchema is a standard JSON Schema. The UI does not need to know anything about fraud scenarios or transaction amounts — it just feeds the schema into a dynamic form renderer. The defaults come from the template merge (scenario defaults overlaid with template defaults), so the form comes pre-filled with realistic values. For our fraud scenario, the operator sees $2,400 already in the transaction amount field, 0.18 in the device trust score, and so on.
What the operator actually sees
The Launch page (/runs/new) renders a rich configuration experience from this single response:
- Scenario selector — Pack and scenario dropdowns, template picker
- Input form — Dynamically generated from
formSchema, pre-filled withdefaults - Execution mode —
live(real runtime) orsandbox - Tags and metadata — Optional labels, actor ID, run label
- Participant summary — Read-only list showing agents and their roles
- Launch summary — Mode, policy, TTL at a glance
- Input mode toggle — Switch between form view and raw JSON editor
That last one is a nice touch. Domain experts use the form; power users who want to paste in a modified payload can flip to the raw JSON editor. Same data, different interfaces.
Compilation: Turning Intent into Execution
The operator clicks "Launch." Now the real magic starts.
The user's intent — "run the fraud scenario with majority-veto rules and these specific inputs" — needs to become a fully resolved ExecutionRequest that the Control Plane can execute. This is the compiler's job, and it is more involved than you might expect. It has to load the scenario definition, merge in the template overrides, layer on the user's inputs, validate everything against the schema, substitute template variables, and assemble the final request.
sequenceDiagram
participant UI as UI Console
participant ES as Example Service
participant AJV as JSON Schema Validator
UI->>ES: POST /launch/compile<br/>{ scenarioRef, templateId, inputs, mode }
ES->>ES: Parse scenarioRef<br/>"fraud/high-value-new-device@1.0.0"
ES->>ES: Load scenario.yaml from registry
ES->>ES: Load template (if templateId provided)
ES->>ES: Merge: scenario defaults ← template defaults ← user inputs
ES->>AJV: Validate merged inputs against schema
AJV-->>ES: Valid
ES->>ES: Apply template overrides (policy, runtime)
ES->>ES: Substitute template variables<br/>"{{ inputs.transactionAmount }}" → 2400
ES->>ES: Build ExecutionRequest
ES-->>UI: CompileLaunchResultVariable substitution: from templates to concrete values
The double-brace template variables we saw earlier get replaced with the operator's actual input values. This is straightforward string substitution, but it is what transforms a reusable scenario template into a specific, executable coordination request:
# Before substitution
proposal_id: "{{ inputs.customerId }}-initial-review"
transactionAmount: "{{ inputs.transactionAmount }}"
# After substitution (inputs.customerId = "CUST-1001", transactionAmount = 2400)
proposal_id: "CUST-1001-initial-review"
transactionAmount: 2400The compiled ExecutionRequest: the complete picture
Here is what the compiler produces for our fraud scenario with the majority-veto template. This is the artifact that gets handed to the Control Plane — it contains everything needed to create a runtime session, bootstrap agents, and execute the coordination:
{
"mode": "live",
"runtime": { "kind": "rust", "version": "v1" },
"session": {
"modeName": "macp.mode.decision.v1",
"modeVersion": "1.0.0",
"configurationVersion": "config.default",
"policyVersion": "policy.fraud.majority-veto",
"policyHints": {
"type": "majority",
"threshold": 0.5,
"vetoEnabled": true,
"vetoThreshold": 1
},
"ttlMs": 300000,
"initiatorParticipantId": "risk-agent",
"participants": [
{ "id": "fraud-agent", "role": "fraud" },
{ "id": "growth-agent", "role": "growth" },
{ "id": "compliance-agent", "role": "compliance" },
{ "id": "risk-agent", "role": "risk" }
],
"context": {
"customerId": "CUST-1001",
"transactionAmount": 2400,
"deviceTrustScore": 0.18,
"accountAgeDays": 14,
"isVipCustomer": true,
"priorChargebacks": 1
},
"metadata": {
"source": "example-service",
"scenarioRef": "fraud/high-value-new-device@1.0.0",
"templateId": "majority-veto",
"demoType": "fraud-decision",
"decisionOwner": "risk-agent",
"specialists": ["fraud-agent", "growth-agent", "compliance-agent"]
}
},
"kickoff": [
{
"from": "risk-agent",
"to": ["fraud-agent", "growth-agent", "compliance-agent"],
"kind": "proposal",
"messageType": "Proposal",
"payloadEnvelope": {
"encoding": "proto",
"proto": {
"typeName": "macp.modes.decision.v1.ProposalPayload",
"value": {
"proposal_id": "CUST-1001-initial-review",
"option": "evaluate_transaction",
"rationale": "Decide whether to approve, step_up, or decline."
}
}
}
}
],
"execution": {
"tags": ["example", "fraud", "high-value-new-device", "demo"],
"requester": { "actorId": "example-service", "actorType": "service" }
}
}Take a moment to appreciate what just happened. A YAML scenario definition, a YAML template, and a handful of user inputs got merged, validated, substituted, and assembled into a fully self-contained execution request. The operator filled in a form; the compiler did the rest. Every template variable has been replaced with a concrete value, the policy is set to majority-veto, and the kickoff message is ready to go. The Control Plane can take this and run with it — literally.
Agent Bootstrapping: Bringing the Participants to Life
Here is where things get physical. The ExecutionRequest describes what should happen, but someone needs to actually spawn the agent processes that will participate in the coordination. When bootstrapAgents: true (which is the default for example runs), the Example Service takes on this responsibility.
For each participant in the scenario, the Example Service looks up the agent definition in its catalog, finds the right framework adapter, builds a bootstrap payload, writes it to a temporary file, and spawns a child process. The spawned agent reads the bootstrap file, discovers where the Control Plane lives, and connects.
sequenceDiagram
participant ES as Example Service
participant Cat as Agent Catalog
participant Reg as Adapter Registry
participant Sup as Launch Supervisor
participant Agent as Spawned Agent
loop For each participant
ES->>Cat: Lookup agentRef (e.g., "fraud-agent")
Cat-->>ES: ExampleAgentDefinition<br/>(framework, entrypoint, manifest)
ES->>Reg: Get adapter for framework<br/>(e.g., LangGraphHostAdapter)
Reg-->>ES: Adapter instance
ES->>ES: Validate manifest
ES->>ES: Build BootstrapPayload
ES->>ES: Write bootstrap JSON to /tmp/macp-bootstrap/
ES->>Sup: launch(command, args, env)
Sup->>Agent: spawn child process
Note over Agent: Reads MACP_BOOTSTRAP_FILE<br/>Connects to Control Plane
endFramework adapters: one interface, many runtimes
Here is a design decision worth calling out. MACP does not mandate a single agent framework. The fraud agent might be built with LangGraph, the growth agent with LangChain, the compliance agent with CrewAI, and the risk agent with custom Node.js code. They all participate in the same coordination session, speaking the same protocol, but their internal implementation is completely different.
Each framework has a dedicated adapter that knows how to prepare and launch agents for that framework:
| Framework | Adapter | Launch Command | Notes |
|---|---|---|---|
| LangGraph | LangGraphHostAdapter | python3 -m agents.langgraph_worker.main | Requires graphFactory, inputMapper, outputMapper |
| LangChain | LangChainHostAdapter | python3 -m agents.langchain_worker.main | Chain-based execution |
| CrewAI | CrewAIHostAdapter | python3 -m agents.crewai_worker.main | Crew-based multi-agent |
| Custom | CustomHostAdapter | node dist/example-agents/runtime/worker.js | Node.js custom logic |
The BootstrapPayload: everything an agent needs to join the party
Each spawned agent receives a JSON file that contains everything it needs — who it is, what run it belongs to, where to connect, what the scenario context looks like, and how its framework should be configured:
interface BootstrapPayload {
run: {
runId: string;
sessionId?: string;
traceId?: string;
};
participant: {
participantId: string; // "fraud-agent"
agentId: string;
displayName: string;
role: string; // "fraud"
};
runtime: {
baseUrl: string; // Control Plane URL
messageEndpoint: string; // "/runs/{runId}/messages"
eventsEndpoint: string; // "/runs/{runId}/events"
apiKey?: string;
timeoutMs: number;
};
execution: {
scenarioRef: string;
modeName: string; // "macp.mode.decision.v1"
modeVersion: string;
policyVersion?: string;
policyHints?: { ... };
ttlMs: number;
};
session: {
context: Record<string, unknown>; // Transaction details
participants: string[]; // All participant IDs
};
kickoff?: {
messageType: string;
payload: Record<string, unknown>;
};
agent: {
manifest: Record<string, unknown>;
framework: string; // "langgraph", "langchain", etc.
frameworkConfig?: Record<string, unknown>;
};
}The file is written to /tmp/macp-bootstrap/{runId}_{participantId}_{timestamp}.json and the path is passed via the MACP_BOOTSTRAP_FILE environment variable. This file-based handoff is intentional — it avoids passing large payloads through command-line arguments (which have OS-level length limits) and makes debugging easy (you can just cat the file to see what the agent received).
The Fraud Decision: A Complete Story
Now we get to the good part. Everything we have built up to — the catalog, the compiler, the bootstrap system — comes together in a single, dramatic execution. Let's trace the complete lifecycle of our fraud scenario, moment by moment.
Setting the scene
Remember our transaction: $2,400 from a new device (trust score 0.18) on a 14-day-old VIP account with one prior chargeback. Four specialist agents are about to debate what to do about it.
The cast of characters
Each agent brings a different perspective, uses a different framework, and has a fundamentally different set of priorities. That tension is the whole point — coordination is only interesting when the participants disagree.
| Agent | Role | Framework | Responsibility |
|---|---|---|---|
risk-agent | Risk (Initiator) | Custom/Node.js | Coordinates the decision — sends the initial proposal, collects specialist input, issues the final Commitment |
fraud-agent | Fraud | LangGraph | Evaluates device trust, chargeback history, identity-risk signals |
growth-agent | Growth | LangChain | Assesses customer lifetime value, revenue impact, VIP status |
compliance-agent | Compliance | CrewAI | Applies KYC/AML checks, policy rules, regulatory requirements |
The full execution: six phases of a coordinated decision
Buckle up. This is the sequence that plays out in real time, with events streaming to the operator's browser as they happen:
sequenceDiagram
participant UI as UI Console
participant ES as Example Service
participant CP as Control Plane
participant RT as Runtime
participant Risk as risk-agent
participant Fraud as fraud-agent
participant Growth as growth-agent
participant Comp as compliance-agent
Note over UI,Comp: Phase 1 — Launch
UI->>ES: POST /examples/run<br/>{ scenarioRef, templateId: "majority-veto",<br/>inputs: { amount: 2400, device: 0.18, ... } }
ES->>ES: Compile → ExecutionRequest
ES->>ES: Bootstrap 4 agents (spawn processes)
ES->>CP: POST /runs/validate (ExecutionRequest)
CP-->>ES: 204 Valid
ES->>CP: POST /runs (ExecutionRequest)
CP-->>ES: { runId: "run-abc", status: "queued" }
ES-->>UI: RunExampleResult { runId, hostedAgents }
UI->>UI: Navigate to /runs/live/run-abc
Note over UI,Comp: Phase 2 — Session Creation
CP->>RT: SessionStart (gRPC stream)
RT->>RT: Create session OPEN<br/>mode: decision.v1, policy: majority-veto
RT-->>CP: Ack (session bound)
CP->>RT: Kickoff: Proposal from risk-agent
RT-->>CP: Ack
Note over UI,Comp: Phase 3 — Specialist Evaluation
Fraud->>RT: Evaluation<br/>recommendation: BLOCK<br/>confidence: 0.85<br/>"Low device trust + prior chargeback"
RT-->>CP: Accepted envelope
CP-->>UI: SSE canonical_event
Growth->>RT: Evaluation<br/>recommendation: APPROVE<br/>confidence: 0.72<br/>"VIP customer, high lifetime value"
RT-->>CP: Accepted envelope
CP-->>UI: SSE canonical_event
Comp->>RT: Evaluation<br/>recommendation: REVIEW<br/>confidence: 0.68<br/>"KYC complete, AML flag pending"
RT-->>CP: Accepted envelope
CP-->>UI: SSE canonical_event
Note over UI,Comp: Phase 4 — Objection
Fraud->>RT: Objection<br/>severity: critical<br/>"Device trust 0.18 below threshold"
RT-->>CP: Accepted envelope
CP-->>UI: SSE canonical_event
Note over UI,Comp: Phase 5 — Voting
Fraud->>RT: Vote REJECT on "approve"
Growth->>RT: Vote APPROVE on "approve"
Comp->>RT: Vote ABSTAIN
RT-->>CP: Accepted envelopes
CP-->>UI: SSE canonical_events
Note over UI,Comp: Phase 6 — Commitment
Risk->>RT: Commitment<br/>action: "step_up"<br/>reason: "Majority did not approve.<br/>Critical objection from fraud.<br/>Escalate to step-up verification."<br/>outcome_positive: true
RT->>RT: Policy evaluation (majority-veto)<br/>Veto check: critical objection present → veto triggers
RT->>RT: Session → RESOLVED
RT-->>CP: Ack (session_state=RESOLVED)
CP-->>UI: SSE session.state.changed + run.completed
Note over UI: Decision panel shows:<br/>"step_up" with confidence breakdownLet's walk through what just happened, because each phase tells a story.
Phase 1 — Launch. The operator clicks "Run." The Example Service compiles the scenario, spawns four agent processes, validates the execution request with the Control Plane, and submits it. The UI navigates to the live workbench. Total wall time: a couple of seconds.
Phase 2 — Session Creation. The Control Plane opens a bidirectional gRPC stream to the Rust runtime, which creates a new session in OPEN state with the majority-veto policy loaded. The kickoff message — the risk agent's proposal asking the others to evaluate the transaction — gets delivered.
Phase 3 — Specialist Evaluation. This is where it gets interesting. The three specialist agents analyze the transaction independently and arrive at different conclusions. The fraud agent sees the low device trust score and the prior chargeback and recommends BLOCK with 85% confidence. The growth agent sees the VIP status and high lifetime value and recommends APPROVE with 72% confidence. The compliance agent finishes its KYC checks but notes a pending AML flag and recommends REVIEW with 68% confidence. Three agents, three different answers — exactly the kind of disagreement that coordination protocols are designed to resolve.
Phase 4 — Objection. The fraud agent is not done. It files a formal objection with critical severity, citing the device trust score of 0.18 as below threshold. This is not just another data point — under the majority-veto policy, a critical objection can trigger a veto. The fraud agent is essentially saying: "I feel strongly enough about this that I'm willing to block the entire decision."
Phase 5 — Voting. The agents cast their votes on the "approve" option. Fraud votes REJECT. Growth votes APPROVE. Compliance abstains. The tally: 1 approve, 1 reject, 1 abstain. No majority either way.
Phase 6 — Commitment. The risk agent, as the initiator, reads the room. No majority approved. There is a critical objection from fraud. The risk agent commits to step_up — escalating to additional verification. The runtime evaluates this commitment against the majority-veto policy.
How the policy evaluation works at commitment
This is the moment of truth. The majority-veto policy evaluates three things:
- Majority check — Did >50% vote APPROVE? (1 approve, 1 reject, 1 abstain — no majority)
- Veto check — Any critical-severity objection? (Yes, from fraud-agent — veto triggers)
- Result — Policy allows
step_upcommitment since the initiator accounts for the veto
The expected decision kinds for this scenario are approve, step_up, and decline. The outcome step_up means the transaction requires additional verification before proceeding — a sensible middle ground when the agents cannot agree and there is a credible fraud signal.
The session moves to RESOLVED. The Control Plane streams the final events to the UI. The operator sees the decision panel update with "step_up" and a full breakdown of why: which agents voted how, what objections were raised, and how the policy evaluated the commitment.
Submitting to the Control Plane
After compilation and agent bootstrapping, the Example Service hands off to the Control Plane for actual execution. We will keep this section focused on the handoff mechanics — for the full protocol-level details of what happens inside the runtime, see E2E Flow SS 4-8.
sequenceDiagram
participant ES as Example Service
participant CP as Control Plane
participant RT as Runtime
ES->>CP: POST /runs/validate (ExecutionRequest)
CP->>CP: Check mode supported, runtime reachable
CP-->>ES: 204 No Content
ES->>CP: POST /runs (ExecutionRequest)
CP->>CP: Create run (queued → starting)
CP->>RT: Initialize RPC
CP->>RT: StreamSession (bidirectional gRPC)
CP->>RT: SessionStart envelope
RT-->>CP: Ack (session OPEN)
CP->>CP: Bind session → run (binding_session)
CP->>RT: Kickoff messages
CP->>CP: Mark running
loop Stream events
RT-->>CP: Accepted envelopes
CP->>CP: Normalize → persist → project
end
RT-->>CP: Session RESOLVED
CP->>CP: Mark completedNotice the two-step submission: validate first, then create. The validation step (POST /runs/validate) is a dry run — it checks that the requested mode is supported, the runtime is reachable, and the execution request is well-formed. Only after validation passes does the Example Service submit the actual run. This prevents wasted agent bootstrapping when the Control Plane is not ready.
The control plane returns 202 Accepted with:
{
"runId": "a1b2c3d4-...",
"status": "queued",
"traceId": "trace-xyz-123"
}The UI Console uses the runId to navigate to the live workbench and open an SSE stream. From this point on, the operator is watching the coordination unfold in real time.
Live Streaming: Real-Time Visibility into Coordination
The operator is now sitting at the live workbench, watching events roll in. How does that actually work?
The UI Console maintains a persistent Server-Sent Events (SSE) connection to the Control Plane. SSE was chosen over WebSockets deliberately — it is simpler, works through more proxies and load balancers, and the data flow is one-directional (server to client), which is exactly what we need for streaming coordination events.
sequenceDiagram
participant UI as UI Console
participant Proxy as API Proxy
participant CP as Control Plane
UI->>Proxy: EventSource<br/>/api/proxy/control-plane/runs/{runId}/stream<br/>?includeSnapshot=true&afterSeq=0
Proxy->>CP: Forward SSE connection
CP-->>UI: event: snapshot<br/>data: RunStateProjection (full current state)
loop Real-time updates
CP-->>UI: event: canonical_event<br/>data: { id, seq, type, subject, data }
end
CP-->>UI: event: heartbeat<br/>(every ~30s to keep connection alive)
Note over UI: On disconnect:<br/>exponential backoff (1s → 30s)<br/>resume from afterSeq={lastSeq}Three event types, and that is all you need
The SSE protocol is refreshingly simple — just three event types:
| Event | Payload | When |
|---|---|---|
snapshot | Full RunStateProjection | On initial connect and reconnection |
canonical_event | Single CanonicalEvent | Each time a new event is processed |
heartbeat | Empty | Periodic keepalive (~30s) |
The snapshot event is the key to resilient streaming. When you first connect (or reconnect after a network blip), you get the complete current state of the run — not just the events you missed, but the fully projected state. This means the UI can render correctly immediately, without replaying the event history client-side.
Connection management: handling the real world
Networks are unreliable. Connections drop. Proxies time out. The useLiveRun hook handles all of this gracefully:
- Buffer limit — Keeps the last 500 events in memory; older events are trimmed
- Heartbeat timeout — If no heartbeat received within 45 seconds, treats as connection failure
- Reconnection — Exponential backoff: 1s, 2s, 4s, 8s, ... up to 30s max, with 8 max attempts
- Resumption — On reconnect, passes
afterSeq={lastSeq}to avoid re-receiving processed events - Status tracking —
idle→connecting→live→reconnecting→endedorerror
That afterSeq parameter is worth highlighting. Each canonical event carries a monotonically increasing sequence number. When the connection drops and the client reconnects, it tells the server "I have already seen events up to sequence N — start from N+1." No duplicate events, no gaps, no complex reconciliation logic.
The full canonical event vocabulary
As the coordination plays out, these are the event types the UI receives and renders:
| Category | Event Types |
|---|---|
| Run lifecycle | run.created, run.started, run.completed, run.failed |
| Session | session.bound, session.state.changed |
| Participants | participant.seen |
| Messages | message.sent, message.received |
| Signals | signal.emitted |
| Coordination | proposal.created, decision.proposed, decision.finalized |
| Policy | policy.resolved, policy.commitment.evaluated |
The Live Workbench: Making Coordination Visible
Events are streaming in. Now the UI needs to turn that stream of structured data into something an operator can actually understand. The live workbench (/runs/live/{runId}) is a multi-panel view that provides real-time visibility into every aspect of the coordination.
The execution graph: coordination as a visual story
The centerpiece is an interactive directed graph built with React Flow. It transforms the abstract concept of "four agents coordinating under a policy" into something you can see and interact with:
- Node types — Start (flag), Context (database), Agent (bot), Decision (workflow), Output (check)
- Edges — Animated by kind: kickoff, message, proposal
- Layout — Auto-positioned columns: start → context → agents → decision → output
- Live updates — Nodes update status, progress bars, and signal badges as SSE events arrive
In our fraud scenario, you would see the risk agent node pulse as it sends the kickoff proposal, then the three specialist nodes light up as they submit their evaluations, then a critical objection badge appears on the fraud agent, and finally the decision node resolves to "step_up" with the full confidence breakdown.
The supporting panels
The graph tells the high-level story. The panels provide the details:
| Panel | Data Source | What It Shows |
|---|---|---|
| Live Event Feed | events[] from SSE | Reverse-chronological list of canonical events — type, seq, timestamp, source, subject, truncated payload |
| Decision Panel | state.decision | Current decision action, confidence percentage, finalized status, composition of reasons |
| Policy Panel | state.policy | Policy version, description, commitment evaluations (allow/deny with reasons) |
| Signal Rail | state.signals | Side-channel signals from agents — name, timestamp, source, confidence, severity (color-coded) |
| Node Inspector | Selected node | Per-participant deep dive — overview, payloads, signals, traces/artifacts. Filters events by node ID |
| Session Interaction | Manual input | Send messages or signals into the live session (from dropdown, to recipients, message type, JSON payload) |
That last panel — Session Interaction — is particularly powerful. It means the operator is not just a passive observer. They can inject messages into a live coordination session, which is invaluable for debugging and testing edge cases.
The single source of truth: RunStateProjection
All of these panels render from a single projection that the Control Plane builds incrementally from the event stream. This is a critical architectural choice: instead of each panel querying different APIs, they all consume the same data structure:
interface RunStateProjection {
run: RunSummaryProjection;
participants: ParticipantProjection[];
graph: GraphProjection; // Nodes and edges for React Flow
decision: DecisionProjection; // Action, confidence, reasons
signals: SignalProjection; // Emitted signals
progress: ProgressProjection; // Per-participant progress
timeline: TimelineProjection; // Chronological event sequence
trace: TraceSummary; // Distributed trace info
outboundMessages: OutboundMessageSummary;
policy: PolicyProjection; // Policy status and evaluations
}One projection, many views. When a new canonical event arrives, the projection updates, and every panel that cares about that change re-renders. The UI stays consistent because there is only one truth to be consistent with.
After the Decision: Replay, Compare, and Learn
The coordination is complete. Our fraud scenario resolved to step_up. But the story does not end when the session closes — in many ways, that is where the most valuable work begins. Understanding why agents reached a particular decision, comparing outcomes across different policy configurations, and debugging unexpected behavior all happen post-execution.
Replay: time travel for coordination
The Control Plane supports three replay modes, each designed for a different use case:
| Mode | Behavior |
|---|---|
timed | Events replayed with proportional inter-event timing; speed multiplier supported (0.5x, 1x, 2x, 4x) |
step | Events emitted one at a time on request |
instant | All events emitted immediately |
The Timeline Scrubber component renders a range slider with discrete frame markers. Scrubbing loads the RunStateProjection at a specific sequence number via GET /runs/{runId}/replay/state?seq=N, allowing users to rewind to any point in the coordination.
Think of it like a DVR for coordination. You can watch the fraud scenario play out at 2x speed, pause at the moment the fraud agent filed its critical objection, inspect the state of every participant at that instant, then step forward one event at a time to see how the objection changed the outcome. This is extraordinarily useful for understanding policy behavior — "what would have happened if the fraud agent had filed a warning instead of a critical objection?"
Run comparison: side-by-side analysis
The comparison view at /runs/{leftId}/compare/{rightId} puts two runs next to each other and highlights the differences:
- Decision delta — What each run decided and why
- Payload diff — JSON diff of execution requests and outcomes
- Timeline alignment — Events mapped by type across both runs
- Participant comparison — Per-agent activity and signal differences
This is how you answer questions like: "We ran the same fraud scenario with the majority-veto template and the unanimous template — how did the outcomes differ?" Or: "We increased the device trust score from 0.18 to 0.65 — at what point do the agents stop objecting?"
Export and clone: building on what you have learned
- Export bundle — Download the complete run as a JSON archive (events, projection, metrics, traces)
- Clone run — Re-launch with the same
ExecutionRequestbut optional overrides (tags, context, policy) - Archive — Mark run as archived for cleanup
The clone feature is particularly useful for iterative testing. You run a fraud scenario, see the outcome, tweak one parameter (say, raise the device trust score to 0.5), clone the run with that override, and compare the two results. Rinse and repeat until the agents and policies behave the way you want.
When Things Go Wrong: Error Handling Across the Stack
Production systems fail. Networks partition. Agents crash. Inputs get malformed. A system that only works on the happy path is not a system — it is a demo. MACP handles errors at every layer with structured codes and clear feedback that flows all the way back to the operator.
Example Service errors
These are the errors you hit before the run even starts — bad scenario references, invalid inputs, unreachable dependencies:
| Code | HTTP | When |
|---|---|---|
PACK_NOT_FOUND | 404 | Pack slug doesn't match any pack.yaml |
SCENARIO_NOT_FOUND | 404 | Scenario slug not found in pack |
VERSION_NOT_FOUND | 404 | Requested version doesn't exist |
TEMPLATE_NOT_FOUND | 404 | Template slug not found for version |
AGENT_NOT_FOUND | 404 | agentRef doesn't match catalog |
INVALID_SCENARIO_REF | 400 | Ref format invalid (expected pack/scenario@version) |
VALIDATION_ERROR | 400 | User inputs fail JSON schema validation |
COMPILATION_ERROR | 400 | Template variable substitution or merge failure |
CONTROL_PLANE_UNAVAILABLE | 502 | Cannot reach control plane for validate/create |
Control Plane errors
These happen during execution — runtime connectivity issues, protocol violations, policy enforcement:
See E2E Flow SS 13 for the full error taxonomy. Key ones surfaced to the UI:
| Code | When |
|---|---|
RUN_NOT_FOUND | Run ID doesn't exist |
RUNTIME_UNAVAILABLE | Cannot connect to MACP runtime |
KICKOFF_FAILED | Initial message rejected by runtime |
SESSION_EXPIRED | Session TTL exceeded during run |
POLICY_DENIED | Commitment rejected by governance rules |
UI Console error handling
The UI Console handles errors at multiple levels, because failures can happen anywhere in the stack:
- API errors — The
fetcherthrowsApiErrorwithstatus,statusText,service,path. Components checkisNotFoundfor 404-specific handling. - React Error Boundaries — Catch render errors with fallback UI and "Try again" button
- Query errors — TanStack React Query retries once, then renders
ErrorPanelwith action links - SSE failures — Connection status badge shows
reconnectingwith attempt counter; after 8 failed attempts, showserrorstatus with manual retry option
Agent process errors
Even agent crashes are handled gracefully:
- The
LaunchSupervisorcaptures stdout/stderr from spawned agents with prefix:[{framework}:{participantId}:{runId}] - Agent crash is detected via process exit handler; the
HostedExampleAgentstatus reflects the failure - Bootstrap file cleanup happens automatically on process exit
The key principle throughout: errors are structured, codes are specific, and the operator always has enough context to understand what went wrong and what to try next. A POLICY_DENIED error does not just say "something failed" — it tells you which policy rule rejected which commitment and why. That is the difference between an error message that helps and one that wastes your time.