UI Console & Example Service Flow

How the UI Console and Example Service work together to browse scenarios, compile execution requests, bootstrap agents, and stream live coordination results

Status: Non-normative (explanatory).

Complements: MACP End-to-End Flow — for protocol-level details on session creation, mode dispatch, policy evaluation, and replay.

Imagine you are an operations analyst at a fintech company. A $2,400 purchase just came in from a brand-new device — the device trust score is a worrying 0.18 out of 1.0. The account is only two weeks old, but the customer has VIP status. Oh, and there is one prior chargeback on file. Should the transaction go through? Should it be blocked outright? Or should the system escalate to step-up verification?

This is not a question any single system can answer well on its own. It requires fraud expertise, growth strategy, compliance checks, and risk coordination — all working together, in real time, under governance rules that ensure no single voice dominates the outcome.

This document walks you through exactly how MACP makes that happen, from the moment an operator browses a scenario catalog in the UI Console, through agent bootstrapping and execution in the Example Service, all the way to live-streamed coordination results appearing in the browser. We will use the Fraud: High-Value New Device scenario as our protagonist throughout — a single concrete story that illuminates every layer of the system.

The Three Services (and Why No Single One Does Everything)

The journey from "I want to run a scenario" to "here's what the agents decided" crosses three services, each with a distinct job. No single service tries to do everything — that is by design.

The UI Console is the storefront where operators browse and launch. The Example Service is the factory floor where scenarios become executable coordination requests and agents get spun up. And the Control Plane is the nervous system that connects everything to the runtime and streams results back in real time.

Here is how they fit together:

flowchart TB
    subgraph Browser["UI Console — Next.js"]
        Pages["Pages\nScenarios · Launch · Live Run · Replay"]
        SSE["SSE Client\nuseLiveRun hook"]
        Proxy["API Proxy\n/api/proxy/{service}/{path}"]
    end

    subgraph ES["Example Service — NestJS"]
        Catalog["Scenario Catalog\nfile-based YAML registry"]
        Compiler["Compiler\nscenario + template → ExecutionRequest"]
        Hosting["Agent Hosting\nframework adapters · process supervisor"]
        Agents["Spawned Agents\nLangGraph · LangChain · CrewAI · Custom"]
    end

    subgraph CP["Control Plane — NestJS"]
        RunAPI["Run API\nPOST /runs · GET /runs/:id/stream"]
        Executor["Run Executor\nsession lifecycle"]
        Projections["Projection Engine\nevent → read model"]
    end

    subgraph RT["MACP Runtime — Rust"]
        Kernel["Coordination Kernel"]
    end

    Pages --> Proxy
    SSE --> Proxy
    Proxy -->|"HTTP"| Catalog
    Proxy -->|"HTTP"| Compiler
    Proxy -->|"HTTP"| RunAPI
    Proxy -->|"SSE"| RunAPI
    Hosting --> Agents
    Agents -->|"gRPC"| Kernel
    Executor -->|"gRPC bidirectional"| Kernel
    Compiler --> Hosting

    style Browser fill:#1a1a2e,stroke:#4a9eff
    style ES fill:#1a1a2e,stroke:#10b981
    style CP fill:#1a1a2e,stroke:#f59e0b
    style RT fill:#1a1a2e,stroke:#9f7aea

Notice how every call from the browser goes through a single API proxy route (/api/proxy/{service}/{path}). This is not just architectural tidiness — it means the UI Console never talks directly to backend services, which keeps authentication centralized and makes the whole thing deployable behind a single domain.

Here is what each service actually owns:

Service	Responsibility
UI Console	Scenario browsing, run configuration, live visualization, replay, export. All API calls go through a Next.js proxy route (`/api/proxy/{service}/{path}`) that adds authentication headers.
Example Service	Scenario catalog (file-based YAML), input validation, template compilation into `ExecutionRequest`, agent resolution and process hosting. Does NOT embed the runtime.
Control Plane	Run lifecycle (queued → running → completed), runtime session management, event normalization, projection building, SSE streaming to UI. See E2E Flow § 4 for details.
MACP Runtime	Protocol enforcement, session state, mode dispatch, policy evaluation. See E2E Flow § 5–8 for details.

The Scenario Catalog: YAML Files All the Way Down

The scenario catalog is deliberately low-tech — it is just YAML files on disk, organized in a hierarchy that mirrors how domain teams think. At the top level, packs group scenarios by business domain: fraud, lending, claims. Inside each pack, individual scenarios describe specific coordination situations. Each scenario can have multiple versions (because requirements evolve) and multiple templates (because the same scenario might run under different governance policies).

Why YAML files instead of a database? Because scenarios are authored by domain experts alongside their agent code, version-controlled in git, and reviewed in pull requests. The file system is the source of truth. The Example Service simply reads it on startup.

flowchart TB
    Packs["packs/"] --> FraudPack["fraud/\npack.yaml"]
    Packs --> LendingPack["lending/\npack.yaml"]
    Packs --> ClaimsPack["claims/\npack.yaml"]

    FraudPack --> FraudScenarios["scenarios/"]
    FraudScenarios --> HVND["high-value-new-device/"]
    HVND --> V1["1.0.0/"]
    V1 --> ScenarioYAML["scenario.yaml"]
    V1 --> Templates["templates/"]
    Templates --> Default["default.yaml"]
    Templates --> MajorityVeto["majority-veto.yaml"]
    Templates --> Unanimous["unanimous.yaml"]
    Templates --> StrictRisk["strict-risk.yaml"]

What a pack looks like

A pack is just a thin wrapper — a slug, a human name, a description, and some tags for filtering. The fraud pack, for example:

# packs/fraud/pack.yaml
apiVersion: scenarios.macp.dev/v1
kind: ScenarioPack
metadata:
  slug: fraud
  name: Fraud
  description: Fraud and risk decisioning demos
  tags: [fraud, risk, growth, demo]

The scenario version: where things get interesting

Each scenario version is a complete, self-contained definition. This is where the real design thinking lives. Let's look at our fraud scenario — the one we will trace through the entire system:

# packs/fraud/scenarios/high-value-new-device/1.0.0/scenario.yaml
apiVersion: scenarios.macp.dev/v1
kind: ScenarioVersion
metadata:
  pack: fraud
  scenario: high-value-new-device
  version: 1.0.0
  name: High Value Purchase From New Device
  summary: >
    Fraud, Growth, Compliance, and Risk agents discuss
    a transaction and produce a decision.
  tags: [fraud, growth, compliance, risk, demo]
spec:
  runtime:
    kind: rust
    version: v1

  inputs:
    schema:
      type: object
      properties:
        transactionAmount:
          type: number
          default: 2400
          minimum: 1
        deviceTrustScore:
          type: number
          default: 0.18
          minimum: 0
          maximum: 1
        # ... accountAgeDays, isVipCustomer, priorChargebacks
      required:
        - transactionAmount
        - deviceTrustScore
        - accountAgeDays
        - isVipCustomer
        - priorChargebacks

  launch:
    modeName: macp.mode.decision.v1
    modeVersion: 1.0.0
    configurationVersion: config.default
    policyVersion: policy.default
    ttlMs: 300000
    initiatorParticipantId: risk-agent

    participants:
      - id: fraud-agent
        role: fraud
        agentRef: fraud-agent
      - id: growth-agent
        role: growth
        agentRef: growth-agent
      - id: compliance-agent
        role: compliance
        agentRef: compliance-agent
      - id: risk-agent
        role: risk
        agentRef: risk-agent

    contextTemplate:
      customerId: "{{ inputs.customerId }}"
      transactionAmount: "{{ inputs.transactionAmount }}"
      deviceTrustScore: "{{ inputs.deviceTrustScore }}"
      # ... remaining fields

    kickoffTemplate:
      - from: risk-agent
        to: [fraud-agent, growth-agent, compliance-agent]
        kind: proposal
        messageType: Proposal
        payloadEnvelope:
          encoding: proto
          proto:
            typeName: macp.modes.decision.v1.ProposalPayload
            value:
              proposal_id: "{{ inputs.customerId }}-initial-review"
              option: evaluate_transaction
              rationale: >
                Decide whether to approve, step_up,
                or decline the transaction.

  outputs:
    expectedDecisionKinds: [approve, step_up, decline]
    expectedSignals:
      - suspicious_device
      - chargeback_history
      - vip_customer

There is a lot packed in here, so let's unpack the key design decisions. The inputs.schema section is a standard JSON Schema — it tells the UI exactly what form fields to render and how to validate them. The launch section defines the coordination structure: which mode to use (decision mode v1), who participates, who initiates, and what the kickoff message looks like. And here is the clever part: the contextTemplate and kickoffTemplate use double-brace template variables ({{ inputs.transactionAmount }}) that get substituted at compile time with the user's actual values. The scenario author defines the shape of the coordination; the operator fills in the specifics.

Templates: same scenario, different rules

Here is where MACP's separation of concerns really shines. The same fraud scenario can run under completely different governance policies just by swapping templates. The majority-veto template, for instance, changes the policy to require a simple majority vote with a blocking-objection veto:

# templates/majority-veto.yaml
apiVersion: scenarios.macp.dev/v1
kind: ScenarioTemplate
metadata:
  scenarioVersion: fraud/high-value-new-device@1.0.0
  slug: majority-veto
  name: Majority Vote with Veto
spec:
  defaults:
    transactionAmount: 2400
    deviceTrustScore: 0.18
    accountAgeDays: 14
    isVipCustomer: true
    priorChargebacks: 1
  overrides:
    launch:
      policyVersion: policy.fraud.majority-veto
      policyHints:
        type: majority
        description: >
          Simple majority vote with blocking-objection
          veto and 2-voter quorum
        threshold: 0.5
        vetoEnabled: true
        vetoThreshold: 1

The template provides sensible defaults for the inputs (so the operator does not have to fill in every field) and overrides the policy version. Same agents, same scenario structure, but the governance changes completely. You could also have a unanimous.yaml template that requires all agents to agree, or a strict-risk.yaml that gives the risk agent unilateral authority. The coordination logic stays the same; only the rules change.

How the UI discovers all of this

The UI Console fetches the catalog through two simple endpoints:

Endpoint	Returns
`GET /packs`	All pack summaries (slug, name, description, tags, scenario count)
`GET /packs/{packSlug}/scenarios`	All scenario summaries in a pack (slug, version, name, summary, tags, template list)

This gives the UI everything it needs to render a browsable catalog — pack cards with scenario counts, drill-down into individual scenarios, and template pickers. No complex query language, no GraphQL schema to maintain. Just two endpoints that return the file tree as structured data.

Configuring a Launch: From Browsing to "Run This"

So the operator has found the fraud scenario and picked the majority-veto template. What happens next? The UI needs to show a configuration form — but it cannot just hardcode one. Different scenarios have different inputs, different defaults, different participants. The form has to be generated dynamically.

This is where the launch schema comes in. When the operator selects a scenario and template, the UI fetches a complete description of everything that can be configured:

sequenceDiagram
    participant UI as UI Console
    participant Proxy as API Proxy
    participant ES as Example Service

    UI->>Proxy: GET /packs/fraud/scenarios/high-value-new-device/<br/>versions/1.0.0/launch-schema?template=majority-veto
    Proxy->>ES: Forward request
    ES->>ES: Load scenario.yaml + majority-veto.yaml
    ES->>ES: Merge defaults and overrides
    ES-->>Proxy: LaunchSchemaResponse
    Proxy-->>UI: LaunchSchemaResponse

    Note over UI: Render dynamic form<br/>from formSchema + defaults

The LaunchSchemaResponse: everything the UI needs in one shot

The response is designed so the UI can render a complete, functional configuration page without any additional API calls:

interface LaunchSchemaResponse {
  scenarioRef: string;              // "fraud/high-value-new-device@1.0.0"
  templateId?: string;              // "majority-veto"
  formSchema: Record<string, unknown>; // JSON Schema for input form
  defaults: Record<string, unknown>;   // Pre-filled values
  participants: Array<{
    id: string;   role: string;   agentRef: string;
  }>;
  agents: ExampleAgentSummary[];    // Framework, role, capabilities
  runtime: { kind: string; version?: string };
  launchSummary: {
    modeName: string;               // "macp.mode.decision.v1"
    modeVersion: string;            // "1.0.0"
    configurationVersion: string;   // "config.default"
    policyVersion?: string;         // "policy.fraud.majority-veto"
    policyHints?: { type?, threshold?, vetoEnabled?, ... };
    ttlMs: number;                  // 300000
    initiatorParticipantId?: string; // "risk-agent"
  };
  expectedDecisionKinds?: string[]; // ["approve", "step_up", "decline"]
}

Notice how the formSchema is a standard JSON Schema. The UI does not need to know anything about fraud scenarios or transaction amounts — it just feeds the schema into a dynamic form renderer. The defaults come from the template merge (scenario defaults overlaid with template defaults), so the form comes pre-filled with realistic values. For our fraud scenario, the operator sees $2,400 already in the transaction amount field, 0.18 in the device trust score, and so on.

What the operator actually sees

The Launch page (/runs/new) renders a rich configuration experience from this single response:

Scenario selector — Pack and scenario dropdowns, template picker
Input form — Dynamically generated from formSchema, pre-filled with defaults
Execution mode — live (real runtime) or sandbox
Tags and metadata — Optional labels, actor ID, run label
Participant summary — Read-only list showing agents and their roles
Launch summary — Mode, policy, TTL at a glance
Input mode toggle — Switch between form view and raw JSON editor

That last one is a nice touch. Domain experts use the form; power users who want to paste in a modified payload can flip to the raw JSON editor. Same data, different interfaces.

Compilation: Turning Intent into Execution

The operator clicks "Launch." Now the real magic starts.

The user's intent — "run the fraud scenario with majority-veto rules and these specific inputs" — needs to become a fully resolved ExecutionRequest that the Control Plane can execute. This is the compiler's job, and it is more involved than you might expect. It has to load the scenario definition, merge in the template overrides, layer on the user's inputs, validate everything against the schema, substitute template variables, and assemble the final request.

sequenceDiagram
    participant UI as UI Console
    participant ES as Example Service
    participant AJV as JSON Schema Validator

    UI->>ES: POST /launch/compile<br/>{ scenarioRef, templateId, inputs, mode }
    ES->>ES: Parse scenarioRef<br/>"fraud/high-value-new-device@1.0.0"
    ES->>ES: Load scenario.yaml from registry
    ES->>ES: Load template (if templateId provided)
    ES->>ES: Merge: scenario defaults ← template defaults ← user inputs
    ES->>AJV: Validate merged inputs against schema
    AJV-->>ES: Valid
    ES->>ES: Apply template overrides (policy, runtime)
    ES->>ES: Substitute template variables<br/>"{{ inputs.transactionAmount }}" → 2400
    ES->>ES: Build ExecutionRequest
    ES-->>UI: CompileLaunchResult

Variable substitution: from templates to concrete values

The double-brace template variables we saw earlier get replaced with the operator's actual input values. This is straightforward string substitution, but it is what transforms a reusable scenario template into a specific, executable coordination request:

# Before substitution
proposal_id: "{{ inputs.customerId }}-initial-review"
transactionAmount: "{{ inputs.transactionAmount }}"

# After substitution (inputs.customerId = "CUST-1001", transactionAmount = 2400)
proposal_id: "CUST-1001-initial-review"
transactionAmount: 2400

The compiled ExecutionRequest: the complete picture

Here is what the compiler produces for our fraud scenario with the majority-veto template. This is the artifact that gets handed to the Control Plane — it contains everything needed to create a runtime session, bootstrap agents, and execute the coordination:

{
  "mode": "live",
  "runtime": { "kind": "rust", "version": "v1" },
  "session": {
    "modeName": "macp.mode.decision.v1",
    "modeVersion": "1.0.0",
    "configurationVersion": "config.default",
    "policyVersion": "policy.fraud.majority-veto",
    "policyHints": {
      "type": "majority",
      "threshold": 0.5,
      "vetoEnabled": true,
      "vetoThreshold": 1
    },
    "ttlMs": 300000,
    "initiatorParticipantId": "risk-agent",
    "participants": [
      { "id": "fraud-agent", "role": "fraud" },
      { "id": "growth-agent", "role": "growth" },
      { "id": "compliance-agent", "role": "compliance" },
      { "id": "risk-agent", "role": "risk" }
    ],
    "context": {
      "customerId": "CUST-1001",
      "transactionAmount": 2400,
      "deviceTrustScore": 0.18,
      "accountAgeDays": 14,
      "isVipCustomer": true,
      "priorChargebacks": 1
    },
    "metadata": {
      "source": "example-service",
      "scenarioRef": "fraud/high-value-new-device@1.0.0",
      "templateId": "majority-veto",
      "demoType": "fraud-decision",
      "decisionOwner": "risk-agent",
      "specialists": ["fraud-agent", "growth-agent", "compliance-agent"]
    }
  },
  "kickoff": [
    {
      "from": "risk-agent",
      "to": ["fraud-agent", "growth-agent", "compliance-agent"],
      "kind": "proposal",
      "messageType": "Proposal",
      "payloadEnvelope": {
        "encoding": "proto",
        "proto": {
          "typeName": "macp.modes.decision.v1.ProposalPayload",
          "value": {
            "proposal_id": "CUST-1001-initial-review",
            "option": "evaluate_transaction",
            "rationale": "Decide whether to approve, step_up, or decline."
          }
        }
      }
    }
  ],
  "execution": {
    "tags": ["example", "fraud", "high-value-new-device", "demo"],
    "requester": { "actorId": "example-service", "actorType": "service" }
  }
}

Take a moment to appreciate what just happened. A YAML scenario definition, a YAML template, and a handful of user inputs got merged, validated, substituted, and assembled into a fully self-contained execution request. The operator filled in a form; the compiler did the rest. Every template variable has been replaced with a concrete value, the policy is set to majority-veto, and the kickoff message is ready to go. The Control Plane can take this and run with it — literally.

Agent Bootstrapping: Bringing the Participants to Life

Here is where things get physical. The ExecutionRequest describes what should happen, but someone needs to actually spawn the agent processes that will participate in the coordination. When bootstrapAgents: true (which is the default for example runs), the Example Service takes on this responsibility.

For each participant in the scenario, the Example Service looks up the agent definition in its catalog, finds the right framework adapter, builds a bootstrap payload, writes it to a temporary file, and spawns a child process. The spawned agent reads the bootstrap file, discovers where the Control Plane lives, and connects.

sequenceDiagram
    participant ES as Example Service
    participant Cat as Agent Catalog
    participant Reg as Adapter Registry
    participant Sup as Launch Supervisor
    participant Agent as Spawned Agent

    loop For each participant
        ES->>Cat: Lookup agentRef (e.g., "fraud-agent")
        Cat-->>ES: ExampleAgentDefinition<br/>(framework, entrypoint, manifest)
        ES->>Reg: Get adapter for framework<br/>(e.g., LangGraphHostAdapter)
        Reg-->>ES: Adapter instance
        ES->>ES: Validate manifest
        ES->>ES: Build BootstrapPayload
        ES->>ES: Write bootstrap JSON to /tmp/macp-bootstrap/
        ES->>Sup: launch(command, args, env)
        Sup->>Agent: spawn child process
        Note over Agent: Reads MACP_BOOTSTRAP_FILE<br/>Connects to Control Plane
    end

Framework adapters: one interface, many runtimes

Here is a design decision worth calling out. MACP does not mandate a single agent framework. The fraud agent might be built with LangGraph, the growth agent with LangChain, the compliance agent with CrewAI, and the risk agent with custom Node.js code. They all participate in the same coordination session, speaking the same protocol, but their internal implementation is completely different.

Each framework has a dedicated adapter that knows how to prepare and launch agents for that framework:

Framework	Adapter	Launch Command	Notes
LangGraph	`LangGraphHostAdapter`	`python3 -m agents.langgraph_worker.main`	Requires graphFactory, inputMapper, outputMapper
LangChain	`LangChainHostAdapter`	`python3 -m agents.langchain_worker.main`	Chain-based execution
CrewAI	`CrewAIHostAdapter`	`python3 -m agents.crewai_worker.main`	Crew-based multi-agent
Custom	`CustomHostAdapter`	`node dist/example-agents/runtime/worker.js`	Node.js custom logic

The BootstrapPayload: everything an agent needs to join the party

Each spawned agent receives a JSON file that contains everything it needs — who it is, what run it belongs to, where to connect, what the scenario context looks like, and how its framework should be configured:

interface BootstrapPayload {
  run: {
    runId: string;
    sessionId?: string;
    traceId?: string;
  };
  participant: {
    participantId: string;     // "fraud-agent"
    agentId: string;
    displayName: string;
    role: string;              // "fraud"
  };
  runtime: {
    baseUrl: string;           // Control Plane URL
    messageEndpoint: string;   // "/runs/{runId}/messages"
    eventsEndpoint: string;    // "/runs/{runId}/events"
    apiKey?: string;
    timeoutMs: number;
  };
  execution: {
    scenarioRef: string;
    modeName: string;          // "macp.mode.decision.v1"
    modeVersion: string;
    policyVersion?: string;
    policyHints?: { ... };
    ttlMs: number;
  };
  session: {
    context: Record<string, unknown>;  // Transaction details
    participants: string[];            // All participant IDs
  };
  kickoff?: {
    messageType: string;
    payload: Record<string, unknown>;
  };
  agent: {
    manifest: Record<string, unknown>;
    framework: string;         // "langgraph", "langchain", etc.
    frameworkConfig?: Record<string, unknown>;
  };
}

The file is written to /tmp/macp-bootstrap/{runId}_{participantId}_{timestamp}.json and the path is passed via the MACP_BOOTSTRAP_FILE environment variable. This file-based handoff is intentional — it avoids passing large payloads through command-line arguments (which have OS-level length limits) and makes debugging easy (you can just cat the file to see what the agent received).

The Fraud Decision: A Complete Story

Now we get to the good part. Everything we have built up to — the catalog, the compiler, the bootstrap system — comes together in a single, dramatic execution. Let's trace the complete lifecycle of our fraud scenario, moment by moment.

Setting the scene

Remember our transaction: $2,400 from a new device (trust score 0.18) on a 14-day-old VIP account with one prior chargeback. Four specialist agents are about to debate what to do about it.

The cast of characters

Each agent brings a different perspective, uses a different framework, and has a fundamentally different set of priorities. That tension is the whole point — coordination is only interesting when the participants disagree.

Agent	Role	Framework	Responsibility
`risk-agent`	Risk (Initiator)	Custom/Node.js	Coordinates the decision — sends the initial proposal, collects specialist input, issues the final Commitment
`fraud-agent`	Fraud	LangGraph	Evaluates device trust, chargeback history, identity-risk signals
`growth-agent`	Growth	LangChain	Assesses customer lifetime value, revenue impact, VIP status
`compliance-agent`	Compliance	CrewAI	Applies KYC/AML checks, policy rules, regulatory requirements

The full execution: six phases of a coordinated decision

Buckle up. This is the sequence that plays out in real time, with events streaming to the operator's browser as they happen:

sequenceDiagram
    participant UI as UI Console
    participant ES as Example Service
    participant CP as Control Plane
    participant RT as Runtime
    participant Risk as risk-agent
    participant Fraud as fraud-agent
    participant Growth as growth-agent
    participant Comp as compliance-agent

    Note over UI,Comp: Phase 1 — Launch
    UI->>ES: POST /examples/run<br/>{ scenarioRef, templateId: "majority-veto",<br/>inputs: { amount: 2400, device: 0.18, ... } }
    ES->>ES: Compile → ExecutionRequest
    ES->>ES: Bootstrap 4 agents (spawn processes)
    ES->>CP: POST /runs/validate (ExecutionRequest)
    CP-->>ES: 204 Valid
    ES->>CP: POST /runs (ExecutionRequest)
    CP-->>ES: { runId: "run-abc", status: "queued" }
    ES-->>UI: RunExampleResult { runId, hostedAgents }
    UI->>UI: Navigate to /runs/live/run-abc

    Note over UI,Comp: Phase 2 — Session Creation
    CP->>RT: SessionStart (gRPC stream)
    RT->>RT: Create session OPEN<br/>mode: decision.v1, policy: majority-veto
    RT-->>CP: Ack (session bound)
    CP->>RT: Kickoff: Proposal from risk-agent
    RT-->>CP: Ack

    Note over UI,Comp: Phase 3 — Specialist Evaluation
    Fraud->>RT: Evaluation<br/>recommendation: BLOCK<br/>confidence: 0.85<br/>"Low device trust + prior chargeback"
    RT-->>CP: Accepted envelope
    CP-->>UI: SSE canonical_event

    Growth->>RT: Evaluation<br/>recommendation: APPROVE<br/>confidence: 0.72<br/>"VIP customer, high lifetime value"
    RT-->>CP: Accepted envelope
    CP-->>UI: SSE canonical_event

    Comp->>RT: Evaluation<br/>recommendation: REVIEW<br/>confidence: 0.68<br/>"KYC complete, AML flag pending"
    RT-->>CP: Accepted envelope
    CP-->>UI: SSE canonical_event

    Note over UI,Comp: Phase 4 — Objection
    Fraud->>RT: Objection<br/>severity: critical<br/>"Device trust 0.18 below threshold"
    RT-->>CP: Accepted envelope
    CP-->>UI: SSE canonical_event

    Note over UI,Comp: Phase 5 — Voting
    Fraud->>RT: Vote REJECT on "approve"
    Growth->>RT: Vote APPROVE on "approve"
    Comp->>RT: Vote ABSTAIN
    RT-->>CP: Accepted envelopes
    CP-->>UI: SSE canonical_events

    Note over UI,Comp: Phase 6 — Commitment
    Risk->>RT: Commitment<br/>action: "step_up"<br/>reason: "Majority did not approve.<br/>Critical objection from fraud.<br/>Escalate to step-up verification."<br/>outcome_positive: true
    RT->>RT: Policy evaluation (majority-veto)<br/>Veto check: critical objection present → veto triggers
    RT->>RT: Session → RESOLVED
    RT-->>CP: Ack (session_state=RESOLVED)
    CP-->>UI: SSE session.state.changed + run.completed

    Note over UI: Decision panel shows:<br/>"step_up" with confidence breakdown

Let's walk through what just happened, because each phase tells a story.

Phase 1 — Launch. The operator clicks "Run." The Example Service compiles the scenario, spawns four agent processes, validates the execution request with the Control Plane, and submits it. The UI navigates to the live workbench. Total wall time: a couple of seconds.

Phase 2 — Session Creation. The Control Plane opens a bidirectional gRPC stream to the Rust runtime, which creates a new session in OPEN state with the majority-veto policy loaded. The kickoff message — the risk agent's proposal asking the others to evaluate the transaction — gets delivered.

Phase 3 — Specialist Evaluation. This is where it gets interesting. The three specialist agents analyze the transaction independently and arrive at different conclusions. The fraud agent sees the low device trust score and the prior chargeback and recommends BLOCK with 85% confidence. The growth agent sees the VIP status and high lifetime value and recommends APPROVE with 72% confidence. The compliance agent finishes its KYC checks but notes a pending AML flag and recommends REVIEW with 68% confidence. Three agents, three different answers — exactly the kind of disagreement that coordination protocols are designed to resolve.

Phase 4 — Objection. The fraud agent is not done. It files a formal objection with critical severity, citing the device trust score of 0.18 as below threshold. This is not just another data point — under the majority-veto policy, a critical objection can trigger a veto. The fraud agent is essentially saying: "I feel strongly enough about this that I'm willing to block the entire decision."

Phase 5 — Voting. The agents cast their votes on the "approve" option. Fraud votes REJECT. Growth votes APPROVE. Compliance abstains. The tally: 1 approve, 1 reject, 1 abstain. No majority either way.

Phase 6 — Commitment. The risk agent, as the initiator, reads the room. No majority approved. There is a critical objection from fraud. The risk agent commits to step_up — escalating to additional verification. The runtime evaluates this commitment against the majority-veto policy.

How the policy evaluation works at commitment

This is the moment of truth. The majority-veto policy evaluates three things:

Majority check — Did >50% vote APPROVE? (1 approve, 1 reject, 1 abstain — no majority)
Veto check — Any critical-severity objection? (Yes, from fraud-agent — veto triggers)
Result — Policy allows step_up commitment since the initiator accounts for the veto

The expected decision kinds for this scenario are approve, step_up, and decline. The outcome step_up means the transaction requires additional verification before proceeding — a sensible middle ground when the agents cannot agree and there is a credible fraud signal.

The session moves to RESOLVED. The Control Plane streams the final events to the UI. The operator sees the decision panel update with "step_up" and a full breakdown of why: which agents voted how, what objections were raised, and how the policy evaluated the commitment.

Submitting to the Control Plane

After compilation and agent bootstrapping, the Example Service hands off to the Control Plane for actual execution. We will keep this section focused on the handoff mechanics — for the full protocol-level details of what happens inside the runtime, see E2E Flow SS 4-8.

sequenceDiagram
    participant ES as Example Service
    participant CP as Control Plane
    participant RT as Runtime

    ES->>CP: POST /runs/validate (ExecutionRequest)
    CP->>CP: Check mode supported, runtime reachable
    CP-->>ES: 204 No Content

    ES->>CP: POST /runs (ExecutionRequest)
    CP->>CP: Create run (queued → starting)
    CP->>RT: Initialize RPC
    CP->>RT: StreamSession (bidirectional gRPC)
    CP->>RT: SessionStart envelope
    RT-->>CP: Ack (session OPEN)
    CP->>CP: Bind session → run (binding_session)
    CP->>RT: Kickoff messages
    CP->>CP: Mark running

    loop Stream events
        RT-->>CP: Accepted envelopes
        CP->>CP: Normalize → persist → project
    end

    RT-->>CP: Session RESOLVED
    CP->>CP: Mark completed

Notice the two-step submission: validate first, then create. The validation step (POST /runs/validate) is a dry run — it checks that the requested mode is supported, the runtime is reachable, and the execution request is well-formed. Only after validation passes does the Example Service submit the actual run. This prevents wasted agent bootstrapping when the Control Plane is not ready.

The control plane returns 202 Accepted with:

{
  "runId": "a1b2c3d4-...",
  "status": "queued",
  "traceId": "trace-xyz-123"
}

The UI Console uses the runId to navigate to the live workbench and open an SSE stream. From this point on, the operator is watching the coordination unfold in real time.

Live Streaming: Real-Time Visibility into Coordination

The operator is now sitting at the live workbench, watching events roll in. How does that actually work?

The UI Console maintains a persistent Server-Sent Events (SSE) connection to the Control Plane. SSE was chosen over WebSockets deliberately — it is simpler, works through more proxies and load balancers, and the data flow is one-directional (server to client), which is exactly what we need for streaming coordination events.

sequenceDiagram
    participant UI as UI Console
    participant Proxy as API Proxy
    participant CP as Control Plane

    UI->>Proxy: EventSource<br/>/api/proxy/control-plane/runs/{runId}/stream<br/>?includeSnapshot=true&afterSeq=0
    Proxy->>CP: Forward SSE connection

    CP-->>UI: event: snapshot<br/>data: RunStateProjection (full current state)

    loop Real-time updates
        CP-->>UI: event: canonical_event<br/>data: { id, seq, type, subject, data }
    end

    CP-->>UI: event: heartbeat<br/>(every ~30s to keep connection alive)

    Note over UI: On disconnect:<br/>exponential backoff (1s → 30s)<br/>resume from afterSeq={lastSeq}

Three event types, and that is all you need

The SSE protocol is refreshingly simple — just three event types:

Event	Payload	When
`snapshot`	Full `RunStateProjection`	On initial connect and reconnection
`canonical_event`	Single `CanonicalEvent`	Each time a new event is processed
`heartbeat`	Empty	Periodic keepalive (~30s)

The snapshot event is the key to resilient streaming. When you first connect (or reconnect after a network blip), you get the complete current state of the run — not just the events you missed, but the fully projected state. This means the UI can render correctly immediately, without replaying the event history client-side.

Connection management: handling the real world

Networks are unreliable. Connections drop. Proxies time out. The useLiveRun hook handles all of this gracefully:

Buffer limit — Keeps the last 500 events in memory; older events are trimmed
Heartbeat timeout — If no heartbeat received within 45 seconds, treats as connection failure
Reconnection — Exponential backoff: 1s, 2s, 4s, 8s, ... up to 30s max, with 8 max attempts
Resumption — On reconnect, passes afterSeq={lastSeq} to avoid re-receiving processed events
Status tracking — idle → connecting → live → reconnecting → ended or error

That afterSeq parameter is worth highlighting. Each canonical event carries a monotonically increasing sequence number. When the connection drops and the client reconnects, it tells the server "I have already seen events up to sequence N — start from N+1." No duplicate events, no gaps, no complex reconciliation logic.

The full canonical event vocabulary

As the coordination plays out, these are the event types the UI receives and renders:

Category	Event Types
Run lifecycle	`run.created`, `run.started`, `run.completed`, `run.failed`
Session	`session.bound`, `session.state.changed`
Participants	`participant.seen`
Messages	`message.sent`, `message.received`
Signals	`signal.emitted`
Coordination	`proposal.created`, `decision.proposed`, `decision.finalized`
Policy	`policy.resolved`, `policy.commitment.evaluated`

The Live Workbench: Making Coordination Visible

Events are streaming in. Now the UI needs to turn that stream of structured data into something an operator can actually understand. The live workbench (/runs/live/{runId}) is a multi-panel view that provides real-time visibility into every aspect of the coordination.

The execution graph: coordination as a visual story

The centerpiece is an interactive directed graph built with React Flow. It transforms the abstract concept of "four agents coordinating under a policy" into something you can see and interact with:

Node types — Start (flag), Context (database), Agent (bot), Decision (workflow), Output (check)
Edges — Animated by kind: kickoff, message, proposal
Layout — Auto-positioned columns: start → context → agents → decision → output
Live updates — Nodes update status, progress bars, and signal badges as SSE events arrive

In our fraud scenario, you would see the risk agent node pulse as it sends the kickoff proposal, then the three specialist nodes light up as they submit their evaluations, then a critical objection badge appears on the fraud agent, and finally the decision node resolves to "step_up" with the full confidence breakdown.

The supporting panels

The graph tells the high-level story. The panels provide the details:

Panel	Data Source	What It Shows
Live Event Feed	`events[]` from SSE	Reverse-chronological list of canonical events — type, seq, timestamp, source, subject, truncated payload
Decision Panel	`state.decision`	Current decision action, confidence percentage, finalized status, composition of reasons
Policy Panel	`state.policy`	Policy version, description, commitment evaluations (allow/deny with reasons)
Signal Rail	`state.signals`	Side-channel signals from agents — name, timestamp, source, confidence, severity (color-coded)
Node Inspector	Selected node	Per-participant deep dive — overview, payloads, signals, traces/artifacts. Filters events by node ID
Session Interaction	Manual input	Send messages or signals into the live session (from dropdown, to recipients, message type, JSON payload)

That last panel — Session Interaction — is particularly powerful. It means the operator is not just a passive observer. They can inject messages into a live coordination session, which is invaluable for debugging and testing edge cases.

The single source of truth: RunStateProjection

All of these panels render from a single projection that the Control Plane builds incrementally from the event stream. This is a critical architectural choice: instead of each panel querying different APIs, they all consume the same data structure:

interface RunStateProjection {
  run: RunSummaryProjection;
  participants: ParticipantProjection[];
  graph: GraphProjection;           // Nodes and edges for React Flow
  decision: DecisionProjection;     // Action, confidence, reasons
  signals: SignalProjection;        // Emitted signals
  progress: ProgressProjection;     // Per-participant progress
  timeline: TimelineProjection;     // Chronological event sequence
  trace: TraceSummary;              // Distributed trace info
  outboundMessages: OutboundMessageSummary;
  policy: PolicyProjection;         // Policy status and evaluations
}

One projection, many views. When a new canonical event arrives, the projection updates, and every panel that cares about that change re-renders. The UI stays consistent because there is only one truth to be consistent with.

After the Decision: Replay, Compare, and Learn

The coordination is complete. Our fraud scenario resolved to step_up. But the story does not end when the session closes — in many ways, that is where the most valuable work begins. Understanding why agents reached a particular decision, comparing outcomes across different policy configurations, and debugging unexpected behavior all happen post-execution.

Replay: time travel for coordination

The Control Plane supports three replay modes, each designed for a different use case:

Mode	Behavior
`timed`	Events replayed with proportional inter-event timing; speed multiplier supported (0.5x, 1x, 2x, 4x)
`step`	Events emitted one at a time on request
`instant`	All events emitted immediately

The Timeline Scrubber component renders a range slider with discrete frame markers. Scrubbing loads the RunStateProjection at a specific sequence number via GET /runs/{runId}/replay/state?seq=N, allowing users to rewind to any point in the coordination.

Think of it like a DVR for coordination. You can watch the fraud scenario play out at 2x speed, pause at the moment the fraud agent filed its critical objection, inspect the state of every participant at that instant, then step forward one event at a time to see how the objection changed the outcome. This is extraordinarily useful for understanding policy behavior — "what would have happened if the fraud agent had filed a warning instead of a critical objection?"

Run comparison: side-by-side analysis

The comparison view at /runs/{leftId}/compare/{rightId} puts two runs next to each other and highlights the differences:

Decision delta — What each run decided and why
Payload diff — JSON diff of execution requests and outcomes
Timeline alignment — Events mapped by type across both runs
Participant comparison — Per-agent activity and signal differences

This is how you answer questions like: "We ran the same fraud scenario with the majority-veto template and the unanimous template — how did the outcomes differ?" Or: "We increased the device trust score from 0.18 to 0.65 — at what point do the agents stop objecting?"

Export and clone: building on what you have learned

Export bundle — Download the complete run as a JSON archive (events, projection, metrics, traces)
Clone run — Re-launch with the same ExecutionRequest but optional overrides (tags, context, policy)
Archive — Mark run as archived for cleanup

The clone feature is particularly useful for iterative testing. You run a fraud scenario, see the outcome, tweak one parameter (say, raise the device trust score to 0.5), clone the run with that override, and compare the two results. Rinse and repeat until the agents and policies behave the way you want.

When Things Go Wrong: Error Handling Across the Stack

Production systems fail. Networks partition. Agents crash. Inputs get malformed. A system that only works on the happy path is not a system — it is a demo. MACP handles errors at every layer with structured codes and clear feedback that flows all the way back to the operator.

Example Service errors

These are the errors you hit before the run even starts — bad scenario references, invalid inputs, unreachable dependencies:

Code	HTTP	When
`PACK_NOT_FOUND`	404	Pack slug doesn't match any pack.yaml
`SCENARIO_NOT_FOUND`	404	Scenario slug not found in pack
`VERSION_NOT_FOUND`	404	Requested version doesn't exist
`TEMPLATE_NOT_FOUND`	404	Template slug not found for version
`AGENT_NOT_FOUND`	404	agentRef doesn't match catalog
`INVALID_SCENARIO_REF`	400	Ref format invalid (expected `pack/scenario@version`)
`VALIDATION_ERROR`	400	User inputs fail JSON schema validation
`COMPILATION_ERROR`	400	Template variable substitution or merge failure
`CONTROL_PLANE_UNAVAILABLE`	502	Cannot reach control plane for validate/create

Control Plane errors

These happen during execution — runtime connectivity issues, protocol violations, policy enforcement:

See E2E Flow SS 13 for the full error taxonomy. Key ones surfaced to the UI:

Code	When
`RUN_NOT_FOUND`	Run ID doesn't exist
`RUNTIME_UNAVAILABLE`	Cannot connect to MACP runtime
`KICKOFF_FAILED`	Initial message rejected by runtime
`SESSION_EXPIRED`	Session TTL exceeded during run
`POLICY_DENIED`	Commitment rejected by governance rules

UI Console error handling

The UI Console handles errors at multiple levels, because failures can happen anywhere in the stack:

API errors — The fetcher throws ApiError with status, statusText, service, path. Components check isNotFound for 404-specific handling.
React Error Boundaries — Catch render errors with fallback UI and "Try again" button
Query errors — TanStack React Query retries once, then renders ErrorPanel with action links
SSE failures — Connection status badge shows reconnecting with attempt counter; after 8 failed attempts, shows error status with manual retry option

Agent process errors

Even agent crashes are handled gracefully:

The LaunchSupervisor captures stdout/stderr from spawned agents with prefix: [{framework}:{participantId}:{runId}]
Agent crash is detected via process exit handler; the HostedExampleAgent status reflects the failure
Bootstrap file cleanup happens automatically on process exit

The key principle throughout: errors are structured, codes are specific, and the operator always has enough context to understand what went wrong and what to try next. A POLICY_DENIED error does not just say "something failed" — it tells you which policy rule rejected which commitment and why. That is the difference between an error message that helps and one that wastes your time.

UI Console & Example Service Flow

On this page