Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .changeset/docs-compliance-grading-how-it-works.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
---

Add `docs/building/verification/how-grading-works.mdx` explaining how the runner resolves specialism manifests into graded scenarios and evaluates per-scenario capability gates. Fixes a docs gap where an adopter claiming `sales-guaranteed` had no single page explaining which storyboards would run against their agent. Also corrects the `sales-guaranteed` specialism narrative which incorrectly stated that omitting `media_buy.supports_proposals` would trigger proposal grading (schema default is `false`, so omit = skip). Resolves #4037.
1 change: 1 addition & 0 deletions docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,7 @@
"docs/building/verification/conformance",
"docs/building/verification/compliance-catalog",
"docs/building/verification/storyboards-vs-scenarios",
"docs/building/verification/how-grading-works",
"docs/building/verification/validate-your-agent",
"docs/building/verification/addie-socket-mode",
"docs/building/verification/grading",
Expand Down
2 changes: 1 addition & 1 deletion docs/building/verification/compliance-catalog.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ The storyboard runner:
**Implement the tools AND claim the specialism.** An agent that wires all of a specialism's required tools but omits the kebab-case ID from `capabilities.specialisms[]` will be graded **"No applicable tracks found"** by the runner — `tracks_passed = 0, tracks_failed = 0, tracks_skipped = 1`. This is a silent pass at the step level and a silent fail at the track level. The fix is to add the specialism ID (e.g., `"creative-generative"`) to your `get_adcp_capabilities` response.
</Warning>

If any `stable` storyboard fails, your agent is not compliant for that claim. See [Validate Your Agent](/docs/building/verification/validate-your-agent) for how to run the suite locally.
If any `stable` storyboard fails, your agent is not compliant for that claim. See [Validate Your Agent](/docs/building/verification/validate-your-agent) for how to run the suite locally. For a detailed walkthrough of how the runner resolves specialism manifests into graded scenarios — including how capability flags like `media_buy.supports_proposals` gate individual scenarios — see [How grading works](/docs/building/verification/how-grading-works).

## Naming conventions

Expand Down
134 changes: 134 additions & 0 deletions docs/building/verification/how-grading-works.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
---
title: How grading works
sidebarTitle: How grading works
description: "How the AdCP compliance runner translates specialism declarations into a concrete set of graded storyboards — and how capability flags alter that set."
"og:title": "AdCP — How compliance grading works"
---

The [Conformance Specification](/docs/building/verification/conformance#conformance-is-layered) defines three obligation layers: Universal, Protocol, and Specialism. This page explains what happens inside the Specialism layer: how a specialism manifest resolves to a set of graded scenarios, and how per-scenario capability gates can narrow or expand that set.

## From declaration to graded scenarios

When your agent declares a specialism in `get_adcp_capabilities`, the runner:

1. Fetches the specialism manifest at `/compliance/{version}/specialisms/{id}/`.
2. Reads the manifest's `requires_scenarios` list — an ordered set of scenario IDs the runner must grade.
3. For each scenario, checks whether the scenario declares a `requires_capability` gate.
4. If a gate is present, reads the named path from your `get_adcp_capabilities` response to decide whether to run or skip the scenario.

The manifest drives the full scenario list; capability gates apply per-scenario on top of it.

## Specialism manifests

Each specialism's `requires_scenarios` field lists the scenarios the runner will grade. Example — the `sales-guaranteed` manifest declares eight required scenarios:

```yaml
# /compliance/{version}/specialisms/sales-guaranteed/ (source: static/compliance/source/specialisms/sales-guaranteed/index.yaml)
id: sales_guaranteed
requires_scenarios:
- media_buy_seller/refine_products
- media_buy_seller/delivery_reporting
- media_buy_seller/measurement_terms_rejected
- media_buy_seller/pending_creatives_to_start
- media_buy_seller/inventory_list_targeting
- media_buy_seller/inventory_list_no_match
- media_buy_seller/invalid_transitions
- media_buy_seller/proposal_finalize # ← capability-gated
```

Seven of these run unconditionally for any `sales-guaranteed` agent. The eighth — `proposal_finalize` — carries a capability gate.

## Capability gates

A scenario can declare a `requires_capability` block. The runner reads the named path from your `get_adcp_capabilities` response and checks it against the expected value. If the check fails (the capability is absent or false), the scenario is skipped — the `skip` block will appear in runner output with `reason: not_applicable` — and does not contribute to `steps_failed`.

```yaml
# /compliance/{version}/protocols/media-buy/scenarios/proposal_finalize/ (source: static/compliance/source/protocols/media-buy/scenarios/proposal_finalize.yaml)
id: media_buy_seller/proposal_finalize
requires_capability:
path: media_buy.supports_proposals
equals: true
```

The gate is evaluated against your agent's live `get_adcp_capabilities` response at run time — the same call the runner makes during the universal `capability_discovery` storyboard.

<Note>
**Schema status.** `requires_capability` is not yet defined in `storyboard-schema.yaml` — runners recognise it (the TS SDK reads and enforces the block) but scenario-authoring tooling that validates against the storyboard schema will flag it as an unknown field today. Adding it to the schema is tracked separately; until then, treat `requires_capability` as a stable runner-level extension that the schema lints will catch up to.
</Note>


## Worked example

**Scenario:** Priya's StreamHaus platform claims `sales-guaranteed` and declares `media_buy.supports_proposals: true`.

```json
{
"supported_protocols": ["media_buy"],
"specialisms": ["sales-guaranteed"],
"media_buy": {
"supports_proposals": true
}
}
```

**Runner behavior:** all eight `requires_scenarios` run, including `proposal_finalize`. Priya's platform is graded on the full proposal lifecycle — brief with proposals, refine, finalize, and accept via `create_media_buy`.

---

**Scenario:** StreamHaus Direct is an auction-based PG platform — no proposal abstraction. It claims `sales-guaranteed` and declares `media_buy.supports_proposals: false`.

```json
{
"supported_protocols": ["media_buy"],
"specialisms": ["sales-guaranteed"],
"media_buy": {
"supports_proposals": false
}
}
```

**Runner behavior:** seven scenarios run; `proposal_finalize` is skipped. The `skip` block in runner output is the authoritative signal:

```json
{
"storyboard_id": "media_buy_seller/proposal_finalize",
"skip": {
"reason": "not_applicable",
"detail": "requires_capability check: media_buy.supports_proposals must equal true — agent declared false"
}
}
```

When the `skip` block is present, the step was not graded and does not count against `steps_failed`. The `skip.detail` string identifies the specific cause (capability gate, missing specialism declaration, or missing tool).

<Note>
**Absent = false.** The `supports_proposals` field has `"default": false` in the capabilities schema. Omitting it from your response is equivalent to declaring `false` — the runner skips capability-gated proposal scenarios. Declare `true` explicitly to opt in to grading.
</Note>

## Grading verdicts at a glance

| Outcome | `skip.reason` | Meaning |
|---------|---------------|---------|
| Scenario passed | — (no `skip` block) | All validations passed; `passed: true` at the step level |
| Scenario failed | — (no `skip` block) | One or more required validations failed; see `validations[]` for the failing field and `json_pointer` |
| Scenario skipped | `not_applicable` | Step was not run. Check `skip.detail` to distinguish: capability gate evaluated false, specialism not declared, or prerequisite not met |
| Required tool missing | `missing_tool` | Agent declared the specialism but did not expose a tool listed in `required_tools` |

A run's overall compliance verdict is determined by `steps_failed`. Skipped steps (`skip` block present) do not contribute to that counter. The `skip.detail` field is the human-readable string that names the specific skip cause.

## Where each piece lives

| Artifact | URL path | Source |
|----------|----------|--------|
| Specialism manifest | `/compliance/{version}/specialisms/{id}/` | `static/compliance/source/specialisms/{id}/index.yaml` |
| Scenario YAML | `/compliance/{version}/protocols/{protocol}/scenarios/{name}/` | `static/compliance/source/protocols/{protocol}/scenarios/{name}.yaml` |
| Universal storyboards | `/compliance/{version}/universal/` | `static/compliance/source/universal/` |
| Capabilities schema | `/schemas/v3/protocol/get-adcp-capabilities-response.json` | `static/schemas/source/protocol/get-adcp-capabilities-response.json` |

The full specialism-to-scenario index is at [Compliance Catalog](/docs/building/verification/compliance-catalog). The runner output contract defining every skip reason and verdict shape is at `static/compliance/source/universal/runner-output-contract.yaml`.

## Related

- [Conformance Specification](/docs/building/verification/conformance) — the three-layer obligation model and the normative storyboard index
- [Compliance Catalog](/docs/building/verification/compliance-catalog) — full taxonomy of protocols, specialisms, and universal storyboards
- [Validate Your Agent](/docs/building/verification/validate-your-agent) — running the suite locally with `@adcp/client`
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,11 @@ narrative: |
an RFP/brief, generate a proposal with curated bundles and rationale, refine, finalize to
committed status with firm pricing and an inventory hold, and then the buyer accepts via
create_media_buy. The `media_buy_seller/proposal_finalize` scenario covers that flow and is
capability-gated on `media_buy.supports_proposals` — sellers that explicitly declare `false`
skip it as `capability_unsupported`, sellers that declare `true` (or omit the field) are
graded against it. Direct-buy guaranteed sellers (auction PG, retail SKU, quoted-rate)
declare `supports_proposals: false`; full-service guaranteed sellers declare `true`.
capability-gated on `media_buy.supports_proposals` — sellers that declare `false` (or omit
the field, since the schema default is false) skip it with skip_result.reason: not_applicable;
sellers that explicitly declare `true` are graded against it. Direct-buy guaranteed sellers
(auction PG, retail SKU, quoted-rate) declare `supports_proposals: false`; full-service
guaranteed sellers declare `true`.

agent:
interaction_model: media_buy_seller
Expand Down
Loading