Skip to content

Commit 202474f

Browse files
committed
fix(canonical-formats): address Argus review follow-ups + nits on #841
Argus APPROVED #841 and noted five non-blocking follow-ups + four nits. This commit addresses the ones that sharpen wire correctness or close real attack surfaces. ## agent_url canonicalisation in v1 inbound lookup ``find_declaration_by_v1_format_id`` now canonicalises both buyer and seller ``agent_url`` per RFC 3986 §6 before comparing: scheme lowercased, host lowercased, default port (443/80) stripped. Without this a seller publishing ``https://Creative.AdContextProtocol.org`` would silently miss-match a buyer's ``https://creative.adcontextprotocol.org`` and the SDK would return a wrongful ``UNSUPPORTED_FEATURE``. Argus flagged this as the only documented MUST divergence — ``core/format-id.json:11`` is normative. ## ANSI / control-char scrub in advisory identifier echo ``_echo_identifier`` now escapes every C0 control char (incl. ``\\n`` ``\\r`` ``\\x1b``), the C1 range, and Unicode LS/PS line separators to a literal ``\\u<hex>`` form before applying the 128-char cap. A seller publishing ``product_id`` with ``\\n`` or ``\\x1b[`` previously round-tripped that string into ``errors[].details.product_id`` and could forge log lines or manipulate ANSI-aware operator tooling. ## sdk_id alignment between installed + dev Hardcodes ``_SDK_DIST_NAME = "adcontextprotocol-adcp-python"`` (the spec example form). Installed wheels publish the dist name as plain ``adcp``; the prior code emitted ``adcp@<version>`` from a wheel install vs ``adcontextprotocol-adcp-python@0.0.0-dev`` from a checkout, breaking the multi-hop ``(code, field, sdk_id)`` dedup contract in ``core/error.json`` for the same SDK. ## Experimental markers + nits - Added ``.. note:: Experimental`` to ``glob_match``, ``structural_match``, ``load_default_registry``. - Removed duplicate ``NotificationConfig`` in ``adcp.types/__all__``. - Removed ``_echo_identifier`` from ``advisory.py``'s ``__all__``. - Two new tests pull ``SDK_ID`` through the documented public path (``from adcp.canonical_formats import SDK_ID``). ## Tests 5117 passed locally. New coverage: agent_url host-casefolding + default-port stripping; newline / ANSI / Unicode line-separator scrubbing; ``SDK_ID`` public-import + canonical-prefix invariants. ## Deferred per Argus's own suggestion - ``_versions_overlap`` fail-loud on ``~`` / ``^`` — latent until #741 half 2 actually consumes the registry on the inbound path. - ``_walk_for_credential_keys`` cycle guard — not wire-reachable. Refs: #741, #841
1 parent 1d20865 commit 202474f

9 files changed

Lines changed: 222 additions & 23 deletions

MIGRATION_v5_to_v6.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -189,10 +189,13 @@ if decl.format_kind is CanonicalFormatKind.image:
189189
**SDK-source advisory provenance:**
190190

191191
All advisories emitted by the projection layer carry `source="sdk"` and
192-
`sdk_id="<dist_name>@<version>"` where `<dist_name>` is read from the
193-
installed distribution metadata (`importlib.metadata.metadata("adcp")["Name"]`).
194-
Adopters relying on a particular `sdk_id` for multi-hop dedup should
195-
pin to a specific SDK release rather than parsing the string.
192+
`sdk_id="adcontextprotocol-adcp-python@<version>"`. The distribution-name
193+
prefix is fixed (independent of `pyproject.toml`'s `[project].name`)
194+
so wheel installs and dev installs emit the same attribution string —
195+
the multi-hop `(code, field, sdk_id)` dedup contract in `core/error.json`
196+
keys on this and would corrupt under drift. Adopters relying on a
197+
particular `sdk_id` for multi-hop dedup should pin to a specific SDK
198+
release rather than parsing the string.
196199

197200
**Not yet shipped (later beta increments):** v1 → v2 reverse projection,
198201
`pixel_tracker` bidirectional contract, the 14 reference fixtures and

src/adcp/canonical_formats/advisory.py

Lines changed: 52 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,6 @@
1818

1919
from functools import lru_cache
2020
from importlib.metadata import PackageNotFoundError
21-
from importlib.metadata import metadata as _pkg_metadata
2221
from importlib.metadata import version as _pkg_version
2322
from typing import Any, TypeAlias
2423

@@ -37,38 +36,75 @@
3736
SdkAdvisory: TypeAlias = Error
3837

3938

39+
# Canonical distribution name for the wire ``sdk_id``. Hardcoded (rather
40+
# than read from ``pyproject.toml``'s ``[project].name``) so installed
41+
# and dev builds emit the same audit-trail attribution. The
42+
# ``core/error.json`` dedup contract keys on ``(code, field, sdk_id)``;
43+
# drift here corrupts multi-hop deduplication. Installed wheels publish
44+
# the PyPI distribution as ``adcp``; the fully-qualified
45+
# ``adcontextprotocol-adcp-python`` form is what the spec example uses
46+
# and what cross-SDK consumers expect.
47+
_SDK_DIST_NAME: str = "adcontextprotocol-adcp-python"
48+
49+
4050
@lru_cache(maxsize=1)
4151
def _resolve_sdk_id() -> str:
4252
"""Return the wire-format ``sdk_id`` for this SDK build.
4353
4454
Format per ``core/error.json``: ``<sdk_package_name>@<version>``.
45-
Reads the installed distribution's ``Name`` from the package metadata
46-
so the prefix never drifts from the PyPI distribution name — the
47-
audit-trail dedup key depends on it.
55+
The package-name prefix is fixed (``_SDK_DIST_NAME``) so installed
56+
and dev builds emit the same attribution; only the version
57+
component varies between them. Without this, dev installs would
58+
emit a different ``sdk_id`` from wheel installs and break the
59+
multi-hop dedup contract for the same SDK.
4860
4961
Cached because the resolution is process-stable: the package
5062
metadata doesn't change at runtime. Lazy (computed on first call)
5163
so setuptools-scm-style late version resolution still works.
52-
Falls back to a development marker when the package isn't
53-
installed (e.g., running directly out of a checkout without
54-
``pip install -e``).
64+
Falls back to a ``0.0.0-dev`` version marker when the package
65+
isn't installed.
5566
"""
5667
try:
57-
dist_name = _pkg_metadata("adcp")["Name"]
5868
v = _pkg_version("adcp")
5969
except PackageNotFoundError:
60-
dist_name = "adcontextprotocol-adcp-python"
6170
v = "0.0.0-dev"
62-
return f"{dist_name}@{v}"
71+
return f"{_SDK_DIST_NAME}@{v}"
6372

6473

6574
def _echo_identifier(value: str | None) -> str | None:
66-
"""Cap seller-controlled identifiers before echoing into advisory details."""
75+
"""Cap + scrub seller-controlled identifiers before echoing into advisory details.
76+
77+
Two defenses applied in order:
78+
79+
1. **Control-character scrub** — replaces every C0 control char
80+
(``\\x00``-``\\x1f``), the C1 range (``\\x7f``-``\\x9f``), and
81+
all Unicode line separators with a literal ``"\\u<hex>"``
82+
escape. A seller publishing a ``product_id`` containing ``\\n``
83+
or ``\\x1b[`` would otherwise round-trip into
84+
``errors[].details.product_id``, forging log lines or
85+
triggering ANSI escape sequences in operator tooling that
86+
prints one advisory per line.
87+
88+
2. **Length cap** — at 128 chars (after escaping), so a malformed
89+
seller identifier cannot grow the multi-hop ``errors[]`` array
90+
unbounded into the idempotency replay cache.
91+
92+
Returns ``None`` for ``None`` input (the explicit absent-product case).
93+
"""
6794
if value is None:
6895
return None
69-
if len(value) <= _MAX_ECHOED_IDENTIFIER_LEN:
70-
return value
71-
return value[:_MAX_ECHOED_IDENTIFIER_LEN] + "…[truncated]"
96+
scrubbed_chars: list[str] = []
97+
for ch in value:
98+
cp = ord(ch)
99+
# C0 (incl. \t, \n, \r) + DEL + C1 + LS/PS line separators.
100+
if cp < 0x20 or 0x7F <= cp <= 0x9F or ch in ("
", "
"):
101+
scrubbed_chars.append(f"\\u{cp:04x}")
102+
else:
103+
scrubbed_chars.append(ch)
104+
scrubbed = "".join(scrubbed_chars)
105+
if len(scrubbed) <= _MAX_ECHOED_IDENTIFIER_LEN:
106+
return scrubbed
107+
return scrubbed[:_MAX_ECHOED_IDENTIFIER_LEN] + "…[truncated]"
72108

73109

74110
def __getattr__(name: str) -> Any:
@@ -124,9 +160,10 @@ def make_sdk_advisory(
124160

125161
# ``SDK_ID`` is resolved lazily via module ``__getattr__`` (above) — listed
126162
# here so the public surface is documented + introspectable via ``dir()``.
163+
# ``_echo_identifier`` and ``_resolve_sdk_id`` are private helpers; not
164+
# part of ``__all__``.
127165
__all__ = [ # noqa: F822 — SDK_ID provided via module __getattr__
128166
"SDK_ID",
129167
"SdkAdvisory",
130-
"_echo_identifier",
131168
"make_sdk_advisory",
132169
]

src/adcp/canonical_formats/format_options.py

Lines changed: 38 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,44 @@
2424
from __future__ import annotations
2525

2626
from collections.abc import Iterable
27+
from urllib.parse import urlsplit, urlunsplit
2728

2829
from adcp.types import CanonicalFormatKind, Error, FormatId, ProductFormatDeclaration
2930

31+
# Default ports per RFC 3986 §3.2.3 — stripped during canonicalization
32+
# so ``https://x.example:443`` matches ``https://x.example``.
33+
_DEFAULT_PORTS: dict[str, int] = {"http": 80, "https": 443}
34+
35+
36+
def _canonicalize_agent_url(raw: str) -> str:
37+
"""Return ``raw`` with scheme + host lowercased and default port stripped.
38+
39+
Per ``core/format-id.json`` (normative): callers MUST canonicalize
40+
``agent_url`` before comparing two ``FormatId`` values for identity.
41+
Pydantic's ``AnyUrl`` does trailing-slash normalization but not
42+
RFC 3986 §6 host-casefolding or default-port stripping — a seller
43+
publishing ``"https://Creative.AdContextProtocol.org"`` would
44+
silently miss-match a buyer's ``"https://creative.adcontextprotocol.org"``
45+
without this step.
46+
47+
Non-throwing: malformed inputs round-trip as-is. The lookup is a
48+
closed-set match, not a security check; we don't want to reject
49+
here, just normalize what we can.
50+
"""
51+
try:
52+
parts = urlsplit(raw)
53+
except ValueError:
54+
return raw
55+
if not parts.scheme or not parts.hostname:
56+
return raw
57+
scheme = parts.scheme.lower()
58+
host = parts.hostname.lower()
59+
port = parts.port
60+
if port is not None and port == _DEFAULT_PORTS.get(scheme):
61+
port = None
62+
netloc = host if port is None else f"{host}:{port}"
63+
return urlunsplit((scheme, netloc, parts.path, parts.query, ""))
64+
3065

3166
class FormatKindNotInClosedSetError(ValueError):
3267
"""Raised when a ``format_kind`` is not in the product's ``format_options[]``.
@@ -172,12 +207,13 @@ def find_declaration_by_v1_format_id(
172207
should be rejected with ``UNSUPPORTED_FEATURE`` — the v1
173208
``format_id`` is not a recognised entry for this product.
174209
"""
175-
target_url = str(format_id.agent_url)
210+
target_url = _canonicalize_agent_url(str(format_id.agent_url))
176211
target_id = format_id.id
177212
for decl in format_options:
178213
refs = decl.v1_format_ref or []
179214
for ref in refs:
180-
if str(ref.agent_url) == target_url and ref.id == target_id:
215+
ref_url = _canonicalize_agent_url(str(ref.agent_url))
216+
if ref_url == target_url and ref.id == target_id:
181217
return decl
182218
return None
183219

src/adcp/canonical_formats/registry.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,13 @@ def load_default_registry() -> V1V2CanonicalFormatMappingRegistry:
125125
cached per process (the registry is immutable for a given SDK
126126
build, keyed by ``ADCP_VERSION``).
127127
128+
.. note::
129+
**Experimental.** The registry is consulted only on the v1 → v2
130+
inbound path (lands in #741 part 2). Adopters using this helper
131+
to inspect the bundled mappings SHOULD pin to the SDK release
132+
they integrate against; the return shape may sharpen when the
133+
inbound consumer lands.
134+
128135
Raises:
129136
RegistryLoadError: when the bundle is missing, malformed JSON,
130137
or fails schema validation.
@@ -143,6 +150,13 @@ def glob_match(value: str, pattern: str) -> bool:
143150
Treats ``*`` as a permissive wildcard (any chars including ``_``).
144151
Other regex metacharacters are escaped — the pattern language is
145152
glob, not regex.
153+
154+
.. note::
155+
**Experimental.** This helper is unused on the v2 → v1 path (the
156+
registry is consulted only on the v1 → v2 inbound path, which
157+
lands in #741's second PR). The signature may shift when that
158+
path consumes it; adopters SHOULD pin to the SDK release they
159+
integrate against.
146160
"""
147161
if pattern == "*":
148162
return True
@@ -283,6 +297,10 @@ def structural_match(
283297
Returns:
284298
``True`` iff every constraint declared on ``pattern`` is satisfied
285299
by the v1 format's structural shape.
300+
301+
.. note::
302+
**Experimental.** Same caveat as :func:`glob_match` — the v1 → v2
303+
inbound consumer lands in the second half of #741.
286304
"""
287305
if hasattr(pattern, "model_dump"):
288306
p = pattern.model_dump(exclude_none=True)

src/adcp/types/__init__.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1197,7 +1197,6 @@ def __init__(self, *args: object, **kwargs: object) -> None:
11971197
"AvailableMetric",
11981198
"NotificationConfig",
11991199
"PushNotificationConfig",
1200-
"NotificationConfig",
12011200
"ReportingCapabilities",
12021201
"WholesaleFeedEvent",
12031202
"WholesaleFeedWebhook",

tests/fixtures/public_api_snapshot.json

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -776,7 +776,6 @@
776776
"MetricType",
777777
"ModuleType",
778778
"NotificationConfig",
779-
"NotificationConfig",
780779
"NotificationType",
781780
"OfferPrice",
782781
"Offering",

tests/test_canonical_formats_options.py

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -202,6 +202,58 @@ def test_v1_inbound_lookup_distinguishes_by_agent_url() -> None:
202202
assert find_declaration_by_v1_format_id(other_seller, [decl]) is None
203203

204204

205+
def test_v1_inbound_lookup_canonicalises_agent_url_host_case() -> None:
206+
"""RFC 3986 §6 host-casefolding: ``Creative.X`` must match ``creative.x``.
207+
208+
Without canonicalization a seller publishing
209+
``https://Creative.AdContextProtocol.org`` would silently miss-match
210+
a buyer's ``https://creative.adcontextprotocol.org`` and the SDK
211+
would return a wrongful ``UNSUPPORTED_FEATURE``.
212+
"""
213+
from adcp.canonical_formats import find_declaration_by_v1_format_id
214+
from adcp.types import FormatId
215+
216+
seller_decl = ProductFormatDeclaration(
217+
format_kind=CanonicalFormatKind.image,
218+
params={},
219+
v1_format_ref=[
220+
FormatId(
221+
agent_url="https://Creative.AdContextProtocol.org",
222+
id="display_300x250_image",
223+
),
224+
],
225+
)
226+
buyer_ref = FormatId(
227+
agent_url="https://creative.adcontextprotocol.org",
228+
id="display_300x250_image",
229+
)
230+
231+
assert find_declaration_by_v1_format_id(buyer_ref, [seller_decl]) is seller_decl
232+
233+
234+
def test_v1_inbound_lookup_canonicalises_default_port() -> None:
235+
"""Default-port stripping: ``https://x.example:443`` matches ``https://x.example``."""
236+
from adcp.canonical_formats import find_declaration_by_v1_format_id
237+
from adcp.types import FormatId
238+
239+
seller_decl = ProductFormatDeclaration(
240+
format_kind=CanonicalFormatKind.image,
241+
params={},
242+
v1_format_ref=[
243+
FormatId(
244+
agent_url="https://creative.adcontextprotocol.org:443",
245+
id="display_300x250_image",
246+
),
247+
],
248+
)
249+
buyer_ref = FormatId(
250+
agent_url="https://creative.adcontextprotocol.org",
251+
id="display_300x250_image",
252+
)
253+
254+
assert find_declaration_by_v1_format_id(buyer_ref, [seller_decl]) is seller_decl
255+
256+
205257
def test_v1_inbound_lookup_with_no_refs_returns_none() -> None:
206258
from adcp.canonical_formats import find_declaration_by_v1_format_id
207259
from adcp.types import FormatId

tests/test_canonical_formats_projection.py

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -302,3 +302,36 @@ def test_advisory_product_id_is_truncated() -> None:
302302
assert echoed.endswith("…[truncated]")
303303
# The literal cap (128) plus the truncation marker.
304304
assert echoed.startswith("x" * 128)
305+
306+
307+
def test_advisory_product_id_scrubs_newlines() -> None:
308+
"""Newline injection attempts are escaped so operator log emitters
309+
don't see forged lines from seller-controlled identifiers."""
310+
decl = ProductFormatDeclaration(format_kind=CanonicalFormatKind.video_vast, params={})
311+
312+
result = project_declaration_to_v1(decl, product_id="prod_alpha\nFAKE LOG LINE\nprod_omega")
313+
314+
echoed = result.advisories[0].details["product_id"]
315+
assert "\n" not in echoed
316+
assert "\\u000a" in echoed
317+
318+
319+
def test_advisory_product_id_scrubs_ansi_escape() -> None:
320+
"""ANSI ``\\x1b[`` escape sequences are neutralised before echoing."""
321+
decl = ProductFormatDeclaration(format_kind=CanonicalFormatKind.video_vast, params={})
322+
323+
result = project_declaration_to_v1(decl, product_id="prod\x1b[31mRED\x1b[0m")
324+
325+
echoed = result.advisories[0].details["product_id"]
326+
assert "\x1b" not in echoed
327+
assert "\\u001b" in echoed
328+
329+
330+
def test_advisory_product_id_scrubs_unicode_line_separator() -> None:
331+
"""Unicode LS/PS line separators escape too — they break naive line splitters."""
332+
decl = ProductFormatDeclaration(format_kind=CanonicalFormatKind.video_vast, params={})
333+
334+
result = project_declaration_to_v1(decl, product_id="prod
injected")
335+
336+
echoed = result.advisories[0].details["product_id"]
337+
assert "
" not in echoed

tests/test_canonical_formats_public_api.py

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,28 @@ def test_canonical_format_kind_has_13_values() -> None:
7272
assert len(CanonicalFormatKind) == 13
7373

7474

75+
def test_sdk_id_reachable_via_public_package_path() -> None:
76+
"""``SDK_ID`` is documented as importable from :mod:`adcp.canonical_formats`.
77+
78+
It's resolved lazily via module ``__getattr__`` so this also
79+
exercises that dispatch path against the documented import.
80+
"""
81+
from adcp.canonical_formats import SDK_ID
82+
83+
assert isinstance(SDK_ID, str)
84+
assert SDK_ID.startswith("adcontextprotocol-adcp-python@")
85+
86+
87+
def test_sdk_id_uses_canonical_distribution_prefix() -> None:
88+
"""Dev installs and wheel installs MUST emit the same ``sdk_id`` prefix
89+
so the multi-hop ``(code, field, sdk_id)`` dedup contract holds across
90+
deployment shapes."""
91+
from adcp.canonical_formats import SDK_ID
92+
93+
prefix, _, _ = SDK_ID.partition("@")
94+
assert prefix == "adcontextprotocol-adcp-python"
95+
96+
7597
def test_canonical_kind_values_match_spec() -> None:
7698
"""The 13 wire values are the canonical-formats vocabulary; lock them in."""
7799
from adcp.types import CanonicalFormatKind

0 commit comments

Comments
 (0)