Skip to content

Commit afbbfac

Browse files
bokelleyclaude
andcommitted
fix(server): PR #233 expert-review followups — security docs + retry/transform tests
Addresses code-reviewer + security-reviewer findings on PR #233. DOCS (security — moved from docs-only to import-site where callers see it) - SkillMiddleware docstring now includes four security callouts: (1) do NOT swallow ADCPError — serves fake success for failed mutations (IdempotencyConflictError, ADCPTaskError). (2) middleware is a data processor for the full skill payload — params contain buyer briefs, budgets, PII; context has caller_identity and tenant_id. Third-party middleware gets the complete surface; treat as controller-processor. (3) exception messages land in server logs verbatim via logger.exception before client-facing sanitisation — do not format params / caller_identity into exception text. (4) short-circuit caches MUST key on (skill_name, params, caller_identity, tenant_id). skill_name + params alone serves principal A's data to principal B. - Clarify params is request-side only; transforms happen on the return side of call_next. - Note ContextVars propagate through call_next (same asyncio task). DOCS — docs/handler-authoring.md - "Semantics worth knowing" expanded with the composition-order WHY (audit outermost so rate-limited rejected calls don't disappear from audit), short-circuit cache-key requirements, retry support, transform-on-return-not-input rule. - New "Security — middleware is a data processor" callout matches the import-site docstring. TESTS (+2, covering code-reviewer's gap list) - test_middleware_can_invoke_call_next_multiple_times_for_retry: retry-on-transient-error pattern. Middleware calls call_next 3 times; handler fails twice, succeeds on third. Locks the re-entrant composition contract a naive loop-variable closure would break. - test_middleware_can_transform_result_on_return_side: enriching middleware wraps the handler's return. Distinct code path from short-circuit (which never calls call_next). NITS - Removed stale `from a2a.types import TaskStatus # noqa: F401 (unused but document` line with truncated comment in test_middleware_can_short_circuit_without_invoking_handler. Deferred (not blocking this PR): - Protocol class for SkillMiddleware (do alongside ContextFactory to keep declaration style consistent). - Runtime validation of middleware return shape against skill output schemas. - ContextVar-propagation formal docs in a dedicated section. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 6f9bd26 commit afbbfac

3 files changed

Lines changed: 170 additions & 22 deletions

File tree

docs/handler-authoring.md

Lines changed: 44 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -362,21 +362,52 @@ serve(MyAgent(), transport="a2a", middleware=[audit_middleware])
362362

363363
**Semantics worth knowing:**
364364

365-
- **Composition**: `middleware=[Audit(), RateLimit(), Metrics()]` runs
366-
`Audit → RateLimit → Metrics → handler` on the way in and unwinds in
367-
the opposite order.
368-
- **Short-circuit**: a middleware that returns without calling
369-
`call_next()` stops the chain. Its return value becomes the dispatch
370-
result (serialised back to the client). Rate limiters / feature
371-
flags use this.
372-
- **Exception observation**: catch around `await call_next()` to log
373-
failures. Re-raise to let the executor's normal error path take over
374-
(`ADCPError` → failed task with `adcp_error` DataPart; other
375-
exceptions → opaque failed task). Swallowing the exception and
376-
returning a substitute result is allowed but almost always wrong.
365+
- **Composition — put audit outermost.** `middleware=[Audit(),
366+
RateLimit(), Metrics()]` runs `Audit → RateLimit → Metrics →
367+
handler` on the way in and unwinds in the opposite order. **If you
368+
put rate-limiting before audit, rejected requests disappear from
369+
your audit log** — often the most interesting events for security
370+
review. Audit always outermost.
371+
- **Short-circuit — cache keys MUST include principal + tenant.** A
372+
middleware that returns without calling `call_next()` stops the
373+
chain; its return value becomes the dispatch result. Rate limiters
374+
/ feature flags use this. **Caching middleware that short-circuits
375+
must key on `(skill_name, params, context.caller_identity,
376+
context.tenant_id)`** — a cache keyed only on `skill_name + params`
377+
serves principal A's data to principal B on a matching-params call.
378+
- **Exception observation — never swallow an `ADCPError`.** Catch
379+
around `await call_next()` to log failures. Re-raise to let the
380+
executor's normal error path take over (`ADCPError` → failed task
381+
with `adcp_error` DataPart; other exceptions → opaque failed task).
382+
Swallowing an `ADCPError` (especially `IdempotencyConflictError` or
383+
`ADCPTaskError`) and returning a fake-success dict silently converts
384+
a rejected mutation into a "completed" task — double-billing,
385+
double-allocation, duplicated side effects. Don't.
386+
- **Exception messages end up in server logs.** Middleware-raised
387+
exceptions flow through `logger.exception` in the executor before
388+
client-facing sanitisation. Don't format `params` or
389+
`context.caller_identity` into exception text — operators read those
390+
logs.
391+
- **Retry is supported.** Call `call_next()` more than once (e.g.
392+
retry-on-transient-error middleware). Each call gets a fresh
393+
inner chain — composition is re-entrant by design.
394+
- **Transform on return, not on input.** `params` passed in is the
395+
same dict every middleware sees. Mutating it doesn't change what
396+
the next layer receives. Transforms happen on the *return* side by
397+
modifying the value of `await call_next()`.
377398
- **Context access**: the middleware sees the `ToolContext` produced
378399
by the `context_factory` (or the a2a-sdk fallback). Tenant id,
379-
caller identity, anything your factory populates.
400+
caller identity, anything your factory populates. `ContextVar`s set
401+
before `call_next()` propagate to the handler — no `asyncio.create_task`
402+
needed.
403+
404+
**Security — middleware is a data processor for the full skill
405+
payload.** `params` carries decoded buyer briefs, budgets, brand
406+
refs, proposal text, PII in message parts. `context` carries
407+
`caller_identity` + `tenant_id`. Installing a third-party middleware
408+
(SaaS audit, observability vendor, bespoke tracing) hands that vendor
409+
the complete skill surface. Treat it as a data processor under your
410+
GDPR/CCPA controller-processor agreements.
380411

381412
MCP transport has its own middleware story (see "Pattern 2 —
382413
in-process HTTP middleware" above); `SkillMiddleware` is A2A-only.

src/adcp/server/serve.py

Lines changed: 55 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -81,25 +81,70 @@ async def middleware(
8181
) -> Any:
8282
...
8383
84-
Middleware wraps ``call_next()`` — call it (possibly more than once, or
85-
never) to invoke the rest of the chain plus the underlying handler.
86-
Anything the middleware returns becomes the dispatch result the A2A
87-
transport serialises back to the client, so middleware can short-circuit
88-
(skip the handler entirely) or transform the result.
84+
Middleware wraps ``call_next()`` — call it (possibly more than once to
85+
implement retry, or never to short-circuit) to invoke the rest of the
86+
chain plus the underlying handler. Anything the middleware returns
87+
becomes the dispatch result the A2A transport serialises back to the
88+
client, so middleware can short-circuit (skip the handler entirely) or
89+
transform the result on the return side.
8990
9091
Middleware observes both success and failure — catch exceptions around
9192
``call_next()`` to implement audit-on-failure or retry-classifier hooks.
9293
Middleware re-raising propagates to the executor's normal error path
9394
(application ``ADCPError`` → failed task w/ ``adcp_error`` DataPart;
9495
other exceptions → opaque failed task per the spec's error-sanitisation
95-
rule).
96+
rule). **Swallowing an exception and returning a substitute result is
97+
allowed but almost always wrong** — in particular, swallowing
98+
``ADCPError`` subclasses (``IdempotencyConflictError``,
99+
``ADCPTaskError``) serves a fake success for a failed mutation, which
100+
double-bills / double-allocates in production.
101+
102+
``params`` is the parsed request dict passed to every middleware in
103+
the chain and to the handler. Middleware cannot mutate what the next
104+
layer sees by mutating ``params`` — transforms happen on the return
105+
side only, by modifying the value returned from ``call_next()``.
96106
97107
Multiple middlewares compose outermost-first, matching Starlette/ASGI
98108
semantics — if you pass ``middleware=[Audit(), RateLimit(), Metrics()]``,
99109
the runtime order is::
100110
101111
Audit.__call__ → RateLimit.__call__ → Metrics.__call__ → handler
102112
113+
**Put audit outermost.** Middleware that short-circuits (rate limiter,
114+
feature-flag gate) never calls ``call_next()``, so anything deeper in
115+
the chain never sees the request. If your audit middleware sits
116+
*after* the rate limiter, rejected calls disappear from the audit
117+
trail — often the most interesting events for security review.
118+
119+
``call_next()`` runs in the same asyncio task as the middleware that
120+
invoked it, so ``ContextVar`` values set before the call are visible
121+
to downstream middleware and the handler. Don't ``asyncio.create_task``
122+
your way around this unless you need the isolation.
123+
124+
**Security — middleware is a data processor for the full skill payload.**
125+
``params`` is decoded business content (buyer briefs, budgets, brand
126+
references, proposal text, PII in message parts). ``context`` carries
127+
``caller_identity``, ``tenant_id``, and anything your ``context_factory``
128+
populates. Installing a third-party middleware (observability vendor,
129+
SaaS audit pipeline, external tracing) hands that vendor the complete
130+
skill payload surface — treat it as a data processor under your
131+
GDPR/CCPA controller-processor relationships and review the blast
132+
radius before wiring vendors here.
133+
134+
**Security — do not format ``params`` or ``context.caller_identity``
135+
into exception messages.** Middleware-raised exceptions pass through
136+
``logger.exception`` in the executor (server-side trace with the raw
137+
message) before the executor's sanitisation kicks in for the client
138+
response. Exception text ends up in operator logs verbatim; keep it
139+
opaque.
140+
141+
**Security — short-circuit caches MUST include principal + tenant in
142+
the cache key.** A middleware that caches on ``skill_name + params``
143+
alone and returns a cached result without calling ``call_next()``
144+
will serve principal A's data to principal B on a matching-params
145+
call. Key on ``(skill_name, params, context.caller_identity,
146+
context.tenant_id)``.
147+
103148
Example — audit logging with exception capture::
104149
105150
from adcp.server import SkillMiddleware, ToolContext
@@ -114,7 +159,10 @@ async def audit_middleware(
114159
try:
115160
result = await call_next()
116161
except Exception as exc:
117-
audit_log.failure(skill_name, context.caller_identity, exc)
162+
# Keep exception text opaque — this ends up in server logs.
163+
audit_log.failure(
164+
skill_name, context.caller_identity, type(exc).__name__
165+
)
118166
raise
119167
audit_log.success(
120168
skill_name,

tests/test_a2a_server.py

Lines changed: 71 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -951,8 +951,6 @@ async def test_middleware_can_short_circuit_without_invoking_handler():
951951
"""Middleware that returns without calling ``call_next`` stops the
952952
chain and its return value becomes the dispatch result. Rate
953953
limiters and feature flags rely on this."""
954-
from a2a.types import TaskStatus # noqa: F401 (unused but document
955-
956954
handler_called = False
957955

958956
class _TrackingHandler(ADCPHandler):
@@ -1054,3 +1052,74 @@ async def noop_mw(skill_name, params, context, call_next):
10541052
executor = handler.agent_executor
10551053
assert isinstance(executor, ADCPAgentExecutor)
10561054
assert executor._middleware == (noop_mw,)
1055+
1056+
1057+
async def test_middleware_can_invoke_call_next_multiple_times_for_retry():
1058+
"""Retry-on-transient-error middleware calls ``call_next()`` more
1059+
than once — each call builds a fresh inner chain. This locks the
1060+
re-entrant composition contract a naive loop-variable closure would
1061+
break."""
1062+
call_counts = {"mw": 0, "handler": 0}
1063+
1064+
async def retry_middleware(skill_name, params, context, call_next):
1065+
last_exc: Exception | None = None
1066+
for _ in range(3):
1067+
call_counts["mw"] += 1
1068+
try:
1069+
return await call_next()
1070+
except RuntimeError as exc:
1071+
last_exc = exc
1072+
raise last_exc if last_exc else RuntimeError("unreachable")
1073+
1074+
class _TransientFailHandler(ADCPHandler):
1075+
async def get_adcp_capabilities(self, params: Any, context: Any = None) -> Any:
1076+
return {}
1077+
1078+
async def get_products(self, params: Any, context: Any = None) -> Any:
1079+
call_counts["handler"] += 1
1080+
if call_counts["handler"] < 3:
1081+
raise RuntimeError("transient")
1082+
return {"products": [{"id": "finally"}]}
1083+
1084+
executor = ADCPAgentExecutor(_TransientFailHandler(), middleware=[retry_middleware])
1085+
ctx = RequestContext(request=MessageSendParams(message=_make_datapart_msg("get_products")))
1086+
queue = EventQueue()
1087+
await executor.execute(ctx, queue)
1088+
1089+
assert call_counts["mw"] == 3
1090+
assert call_counts["handler"] == 3
1091+
event = await queue.dequeue_event(no_wait=True)
1092+
assert isinstance(event, Task)
1093+
assert event.status.state == "completed"
1094+
1095+
1096+
async def test_middleware_can_transform_result_on_return_side():
1097+
"""Middleware can mutate or replace the value of ``call_next()``
1098+
before returning it. The transformed value is what the client
1099+
sees — covers the annotation / enrichment use case distinct from
1100+
short-circuiting."""
1101+
1102+
async def enriching_middleware(skill_name, params, context, call_next):
1103+
result = await call_next()
1104+
# Wrap handler's return with a marker the test observes.
1105+
if isinstance(result, dict):
1106+
return {**result, "middleware_marker": "wrapped"}
1107+
return result
1108+
1109+
executor = ADCPAgentExecutor(_TestHandler(), middleware=[enriching_middleware])
1110+
ctx = RequestContext(request=MessageSendParams(message=_make_datapart_msg("get_products")))
1111+
queue = EventQueue()
1112+
await executor.execute(ctx, queue)
1113+
1114+
event = await queue.dequeue_event(no_wait=True)
1115+
assert isinstance(event, Task)
1116+
assert event.status.state == "completed"
1117+
data_parts = [
1118+
p.root
1119+
for p in event.artifacts[0].parts
1120+
if hasattr(p.root, "data") and isinstance(p.root.data, dict)
1121+
]
1122+
result = data_parts[0].data
1123+
assert result["middleware_marker"] == "wrapped"
1124+
# And the handler's original payload is still there.
1125+
assert result["products"][0]["id"] == "p1"

0 commit comments

Comments
 (0)