Skip to content

RFC/epic: buyer & orchestrator agent storyboards for 3.1 #2424

@bokelley

Description

@bokelley

Problem

AdCP storyboards and the compliance track exclusively test sell-side agents: sales-*, audience-sync, brand-rights, governance-*, measurement-verification, etc. (static/compliance/source/specialisms/). There is no storyboard coverage, no fixture harness, and no compliance track for the buyer/orchestrator half of the protocol.

Consequences:

  • Buyer and orchestrator agents cannot be certified against a shared behavioral bar.
  • Implementers have no reference suite for "does my orchestrator negotiate, reconcile, and recover correctly."
  • Protocol decisions (idempotency keys, webhook reconciliation, TERMS_REJECTED, pacing) are enforced on senders but untested on receivers/initiators.
  • We ship 3.0 with asymmetric test coverage on both sides of the wire.

This RFC proposes an epic for 3.1 to close the gap.

Context

  • 3.0 hardened sell-side schemas, idempotency, webhook guarantees (3.0: Webhook delivery guarantees (at-least-once, retry, idempotency) #2400), and trust primitives.
  • Storyboards today (storyboard-schema.yaml + SingleAgentClient runner) test inbound agent responses against a scripted caller: buyer says X, agent must respond Y.
  • Buyer/orchestrator storyboards invert the harness: scripted sell-side, agent-under-test is the buyer. Assertions are about outbound judgment and state management, not response schemas.

Proposed shape

1. Harness additions

  • Fixture publisher agent — a reference sell-side implementation (HTTP + MCP + A2A) that replays canned responses keyed by scenario ID. Supports scripted edge cases: slow response, TERMS_REJECTED, webhook drop, stale digest, auth expiry, pacing divergence.
  • Buyer storyboard schema — probably an extension of storyboard-schema.yaml with a new role: "buyer" track. Steps describe publisher fixture state + expected buyer-agent action/decision, not request/response pairs.
  • Behavioral validators — beyond schema checks: did the agent retry, did it challenge, did it stop, did it reconcile. Some will be judgment-based (LLM-as-judge) and gated as SHOULD not MUST initially.

2. Specialism set (initial)

Rough scenario spine:

Specialism Focus
buyer-discovery Brief → agent registry lookup → clarifying questions → candidate shortlist
buyer-planning Products → price_breakdown read → plan assembly → budget pacing math
buyer-negotiation Buy terms counter, TERMS_REJECTED handling, makegood acceptance
buyer-activation Idempotency keys on retries, signal activation coordination
buyer-monitoring Webhook reconciliation, delivery variance, pacing response
buyer-recovery Agent offline, stale digest, auth drift, idempotency collisions
orchestrator-multi-agent Fan-out across 2+ sellers, partial failure, cross-agent reconciliation

3. Compliance track

  • New buyer-orchestrator compliance track (parallel to sell-side tracks).
  • Certification levels: basic (discovery + planning), standard (+ negotiation + activation), advanced (+ monitoring + recovery + multi-agent).

Open questions

  • Judgment assertions: how much do we rely on LLM-as-judge vs. deterministic checks? Start deterministic, layer judgment for things like "did the challenge question make sense."
  • Fixture publisher scope: one reference agent or one-per-specialism? Single agent with scenario-keyed responses is simpler; per-specialism is easier to author.
  • Relationship to training agent (see embedded training agent work): the fixture publisher could double as the certification training foil.
  • Milestone sizing: full epic is likely bigger than 3.1. Minimum viable 3.1 deliverable: schema + fixture publisher + 2 specialisms (discovery, activation) + compliance track stub.

Not in scope

  • Changes to sell-side storyboards.
  • Buyer-side schema changes (this is testing infrastructure, not protocol).
  • Specific LLM judge prompts (separate deliverable).

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    claude-triagedIssue has been triaged by the Claude Code triage routine. Remove to re-triage.epicMajor deliverable — auto-adds to roadmap boardrfcProtocol change — auto-adds to roadmap board

    Type

    No type

    Projects

    Status

    No status

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions