docs(proposals): salesagent side-car experiment plan (full GAM)#506
docs(proposals): salesagent side-car experiment plan (full GAM)#506bokelley wants to merge 7 commits into
Conversation
Falsifies #502's two-platform composition against salesagent + live GAM. Side-car runtime in a salesagent worktree, same DB, untouched admin UI; binary exit criterion is the AdCP media_buy_seller storyboard passing against a sandbox GAM Network ID. In scope: real GAM upstream (wrap, don't port), HITL approval lifecycle (compose_method + ShortCircuit against salesagent's actual WorkflowStep flow), webhook delivery via SDK F12 auto-emit. The HITL mapping is grounded in salesagent's existing gate at google_ad_manager.py:571 + execute_approved_media_buy at media_buy_create.py:458 — no paused-task resumption needed; the SDK's TaskRegistry doesn't participate. Five concrete questions the experiment answers, including the resumption-marker shape (typed ctx.resumption_token vs. adopter-side sentinel) and whether dynamic_products.py factors onto ProposalManager.get_products without gutting it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Read through and verified the salesagent citations against current Gotchas1. Wrap target is wrong. The plan wraps 2. "Same process, two runtimes" is the bigger of two unspecified shims. "Tiny dispatcher consults 3. ~25 structural guards will fire on 4. Two writers, one DB is not fine just because it's a test tenant.
5. The HITL marker decision is N=1, and salesagent's pattern is unrepresentative. Salesagent uses DB-persisted re-call with a sentinel, not paused-coroutine resumption. The experiment can answer "does the SDK seam accommodate salesagent's shape" — it cannot answer "what's the right Protocol seam." Risk: typed 6. 7. Webhook compatibility test is shallow. Disabling 8. The "<300 LOC glue" target for 9. Single exit criterion lets the experiment lie. "Storyboard passes" is satisfiable with wrappers that grew enormous. Need a co-equal budget criterion: no 10. Pin SHAs in step 0. adcp-client-python from Better way (smallest changes I'd push for)
The instincts are right (sidecar, wrap-don't-port, GAM-live, binary criterion). Execution risks: wrap target is too low in the stack, in-process routing is harder than billed, and the architectural hypothesis is over-coupled to one adopter's HITL idiom. |
|
Plan is well-structured and the salesagent file:line citations check out (spot-checked The headline design risk: the experiment quietly conflates "does the SDK shape host salesagent's logic" with "is the SDK shape correct." A wrap that ports cleanly proves the seam is flexible enough, not that it's right — especially on the proposal side, where I control the wrap, the seam, and the success criterion. Gotchas / what won't work cleanly
What to be careful about
Is there a better wayTwo cheaper falsifiers worth running first:
(I considered "do Kevel first, not GAM" but Kevel isn't actually wired to any real tenants today, so it doesn't give us a falsifier with adopter-shape complexity. GAM is the only live target with the surface to be informative — which is also why it's the only one that can really break.) Three concrete adds before merging the proposal doc
|
|
Thanks for the detailed self-review — the concerns are well-founded and several are blockers for the plan's validity. Since this is your branch I can't push fixup commits directly; flagging what should change before the plan is acted on. Blockers (plan is misleading or wrong without addressing these): #1 — Wrap target. Agreed. The plan says "wraps salesagent's existing #2 — Dual transport mounts. The plan doesn't address how routing happens before FastMCP and A2A transport layers claim the request. "Tiny dispatcher consults #3 — Structural guards. A "step 0" item is missing: enumerate which of salesagent's AST guards fire on #4 — Cross-runtime Should be updated before acting on the plan:
Lower priority (valid, can be tracked as risks):
I'd suggest revising the plan to incorporate at least the four blockers above before starting workstream step 1. Claude Code — cannot push to non- Generated by Claude Code |
|
Good second take — it's complementary to the first review (execution risks) and adds the epistemological layer: can the experiment falsify anything it doesn't already control? What's genuinely new vs. what was already flagged:
The three concrete asks are right. The prior response covered the execution blockers; these are the epistemological ones. The "contradicting finding required" framing for the exit criterion is the sharpest version — a storyboard that passes without challenging any #502 prior is unfalsifying by construction, regardless of whether the wraps stayed thin. I can't push to Claude Code — PR-feedback mode, non- Generated by Claude Code |
Major changes addressing self-review on PR #506: 1. Two-phase structure. Phase 1 (~1 day): falsify recipe model on dynamic_products.py in isolation, no infrastructure. Phase 2 (3-5 days, gated on Phase 1): full side-car runtime with GAM live + HITL + webhooks. The cheapest available falsifier of #502 runs first; full experiment only if Phase 1 finds the recipe shape can carry signal-driven variants without escape hatches. 2. Wrap target corrected. Wrap salesagent's _impl seams (_create_media_buy_impl, _update_media_buy_impl, _add_creative_assets_impl, _get_products_impl), NOT adapter bodies. Self-review correctly flagged that wrapping google_ad_manager.py forces re-implementing principal resolution, tenant config, currency validation, signal lookup, audit logs — a port disguised as a wrap. 3. Routing strategy: separate process + nginx tenant-header routing, NOT in-process dispatch. Avoids dual MCP+A2A transport mounts, FastMCP registry collisions, ResolvedIdentity construction conflicts. Killable, independently restartable. 4. Step 0 with concrete prereqs: pin SHAs (SDK, both storyboards, GAM Network ID); identify _impl seams; enumerate AST guards that fire on src/sdk_runtime/; enumerate cross-tenant background services to disable; validate _already_approved sentinel under extra='forbid'; verify webhook signing parity against subscribed test buyer; pre-register falsification signals. 5. Multi-criteria exit (not binary): both storyboards pass, recipe carries implementation_config without extra: dict escape hatches, glue LOC under 60% ratio, zero structural-guard allowlist additions, at least one finding contradicts a #502 prior, webhook signature verified by real subscribed buyer. 6. HITL marker decision decoupled. Salesagent's pattern is N=1 and unrepresentative; experiment informs but doesn't settle the Protocol seam. Spec PR comes after as separate work. 7. Q1.5 added: does recipe model allow proposal-time assembly, not just lookup? implementation_config lives on Product row; dynamic products generate signal-driven variants at brief time. #502's session-cache model may be too restrictive. 8. Three HITL re-entry surfaces in scope (create/update/ add_creative_assets). Creative-specific re-entry through order_approval_service.py out of scope for v1. 9. Both storyboards pinned: media_buy_seller for happy path + media_buy_guaranteed_approval for HITL. Test-controller hybrid mode for delivery simulation alongside real GAM mutations. 10. Auth shim sized correctly per schema audit (~150 LOC total — Principal is bearer-token only, Account is already AdCP-shaped). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Salesagent's multi-adapter abstraction is vestigial: GAM is the only deployed backend (~99% of clients per migration guide §"Migration order"); Kevel/Broadstreet/Triton/Xandr are scaffolding with no client traffic; mock is a fixture. Treating salesagent as a GAM agent that ships dead code simplifies the experiment in three concrete ways: 1. Wrap target is unconditional GAM. The if-adapter-class switches in _impl (e.g., media_buy_create.py:2431-2464) collapse to unconditional logic; no compatibility surface to preserve. 2. Single recipe type — salesagent contributes only the GAM shape to #502's typed-recipe model. Phase 1 falsification narrows. 3. MockAdServer (~1,800 LOC) deletion joins post-experiment cleanup. Updates: - New "Reframing" section after Two phases - Out of scope clarified: other adapters slated for deletion, not preserved - Next steps adds adapter deprecation roadmap (~3,500-4,000 LOC deletion across 4 sequenced PRs) and side-car-to-runtime promotion path - Note that #489 §3.1 needs a "single-adapter adopters skip PlatformRouter" addition (tracked separately) Doesn't change experiment shape: two-platform composition seam, recipe falsification target, HITL/webhook/auth shim work all stay. Reframing simplifies the success path; doesn't shrink the questions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three Step 0 investigations completed; results updated in the doc.
0.2 — _impl seams (CONFIRMED CLEAN)
- All four core _impl functions exist transport-agnostic in
src/core/tools/. File:line refs:
_create_media_buy_impl (media_buy_create.py:1270, async)
_update_media_buy_impl (media_buy_update.py:117, sync)
_get_products_impl (products.py:145, async)
_get_media_buy_delivery_impl (media_buy_delivery.py:67, sync)
_sync_creatives_impl (creatives/_sync.py:29, sync)
- _add_creative_assets_impl doesn't exist; HITL gate uses GAM-
internal name mapping to sync_creatives. Before-hook needs a
small operation-name table.
- Zero salesagent-side refactor required before Phase 2.
0.3 — Structural guards (MUCH SMALLER THAN FEARED)
- ~25 total guards via make quality + .pre-commit-hooks/
- Most are path-filtered out of src/sdk_runtime/ (tests-only,
alembic-only, models.py-only, tools-only, schemas.py/main.py-only)
- Guards that apply to sdk_runtime/: ~5-7 hygiene-level
(sqlalchemy 2.0, no hasattr root, no .fn(), import usage, type
ignore count, code duplication). Trivially satisfiable.
- Zero allowlist additions likely needed; possibly one if
check_parameter_alignment.py doesn't accept a third _impl caller.
- Recommendation: keep src/sdk_runtime/ inside src/.
0.4 — Cross-tenant background services (TWO, NOT FOUR)
- protocol_webhook_service, background_approval_service,
order_approval_service, background_sync_service all fire per-
request or per-order; do NOT need per-tenant disable.
- Two genuinely cross-tenant schedulers from core/main.py lifespan:
- media_buy_status_scheduler.py (60s cadence, lifecycle
transitions cross-tenant)
- delivery_webhook_scheduler.py (daily, sends reporting_webhook
cross-tenant)
- Both would race/duplicate against side-car runtime.
- Concrete prereq added: Tenant.skip_legacy_schedulers: bool
alembic migration + 6-line consults in both schedulers.
Updates also flow through:
- HITL section: sync_creatives is the wire surface, not
add_creative_assets; operation-name mapping called out
- Workstream Phase 2: wraps now include _sync_creatives_impl;
webhook step notes per-tenant scheduler disable, not service
disable
- Risks: three HITL surfaces with sync_creatives correction
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three more Step 0 investigations completed; doc updated.
(1) check_parameter_alignment.py analyzed
- Guard enumerates pairs of (mcp_wrapper, a2a_raw) from a hardcoded
tools list (lines 36-78). Does NOT enumerate "all callers of
_impl." Side-car's GAMDecisioningPlatform calling _impl directly
is invisible to the guard.
- Confirmed: zero allowlist additions needed.
(2) _already_approved sentinel works as-is
- compose_method (compose.py:173-194) passes req through unchanged
from before-hook to inner; no model_validate/model_copy/model_dump
on the request side.
- Dispatcher only model_dumps on response side (dispatch.py:1234,
1306-1307).
- Generated request models use extra='forbid' (validation-time only)
without frozen=True or validate_assignment=True; setattr lands in
__dict__ and persists through Python-level dispatch.
- No typed marker prototype needed for this experiment. Q4 design
question stays open as a Protocol RFC.
(3) Webhook signing parity does NOT hold (real finding)
- Salesagent's webhook_authenticator.py emits X-Webhook-Signature +
X-Webhook-Timestamp with payload f"{ts}.{json.dumps(payload,
separators=(',',':'), sort_keys=True)}" canonicalization.
- SDK's from_adcp_legacy_hmac emits X-AdCP-Signature +
X-AdCP-Timestamp + X-AdCP-Key-Id with different canonicalization.
- Different headers, different canonicalization, different scheme
entirely.
- §3.14's "adopters delete their webhook plumbing wholesale" claim
doesn't hold cleanly — production cutover requires buyer migration.
- Experiment validates SDK→SDK signing with adcp.WebhookReceiver as
test buyer. Does not validate the migration claim against actual
subscribed buyers.
Constraint added to Step 0:
- Local fork edits only. No upstream PRs to salesagent. The
scheduler skips become hardcoded constants or env-var consults
in our experiment fork; revert with `git checkout main`.
Updates flow through:
- Step 0.4: prereq is local fork patch, not alembic migration
- Step 0 Investigated section: three more ✅ items + one ⚠️
- Workstream 2.4: webhook test buyer is adcp.WebhookReceiver (not
a real subscribed buyer; that's out of scope post-experiment)
- Risks: webhook signing parity bullet rewritten with the correct
finding (incompatible, not "verify before run")
- Next steps adds 3a — correct §3.14 of #489 migration guide
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
For each of the six learning questions (Q1, Q1.5, Q2, Q3, Q4, Q5), write down the specific finding that would falsify the prior — before the experiment runs. This is the enforcement mechanism for self-review's "one author wearing three hats" warning: if I don't commit upfront to what would tell me I'm wrong, I'll find what I'm looking for. Concrete, observable falsifiers per question: - Q1: glue >303 LOC, monkey-patching needed, identity impedance - Q1.5: recipe schema requires proposal_id, variant Products need forged rows, hash-dedup state crosses sessions - Q2: any extra: dict, type: ignore on recipe construction, lossy round-trip - Q3: none of the three hydration models work, OR adopter-owned hydration turns out to be the right primitive - Q4: salesagent pattern is N=1; experiment informs but doesn't settle the Protocol seam - Q5: SDK→SDK signing parity already partially falsified; remaining falsifiers are auto-emit doesn't fire, retry doesn't behave Step 0.7 marked ✅ in workstream. The experiment is now fully unblocked from a planning standpoint. Phase 1 can start; Phase 2 prereqs are mechanical (pin SHAs, local fork patch for two schedulers). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Read dynamic_products.py end-to-end (505 LOC, src/services/ dynamic_products.py). Two of three pre-registered Q1.5 falsifiers fire from reading alone, no harness needed. Five structural facts: 1. Variants are persistent DB rows in the products table — not session-scoped state 2. Variants share the template's implementation_config verbatim (line 303); recipe shape doesn't differ 3. Variant identity is globally deterministic via md5 hash of activation_key — dedup crosses sessions/buyers 4. Signal-derived data lives on the Product row (signal_metadata, activation_key, parent_product_id, expires_at, is_dynamic_variant), NOT in implementation_config 5. Variants have an independent lifecycle (TTL + archival) with no relationship to proposal acceptance/finalization Falsifiers fired (Q1.5): - "Variant Products require new schema rows" — confirmed - "Hash-dedup state crosses sessions" — confirmed - "Recipe schema requires proposal_id lookup" — not strictly fired, but related: recipe is Product-scoped not proposal-scoped, so #502's framework-managed-recipe-state model is wrong shape Implication for #502: - Recipe is adopter-owned data on the Product row (or equivalent) - Framework's job at the seam is TYPING the recipe contract, not CACHING it - Proposal-time assembly (generating new Product rows that share a template's recipe) is adopter logic; framework shouldn't try to cache "proposal recipes" because proposals don't own them Exit criterion (5) — at least one finding contradicts a #502 prior — satisfied early, pre-registered, before 1B harness runs. Q1 prediction (still pre-1B): wrapper is small (~50-80 LOC) for the dynamic-products subset. Wrapper sketch included in doc. Q2 still pending — needs 1B run projecting actual implementation_config through a typed Pydantic recipe. Phase 1B harness setup documented (worktree, fixtures, signals_agent_registry mock, wrapper module, typed GAMRecipe model, test harness). ~2 hours in a salesagent worktree. Next session. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds examples/recipe_falsification/ — the pytest harness for Q2 of the salesagent side-car experiment (PR #506): > Does the recipe shape carry GAM's implementation_config without > escape hatches? Files: - gam_recipe.py — typed Pydantic GAMRecipe model with sub-models (CreativePlaceholder, FrequencyCap), Literal-typed enums for every documented GAM API value, extra="forbid" on every model - fixtures/gam_impl_config_examples.json — five fixture shapes derived from salesagent's GAMProductConfigService: guaranteed_default, non_guaranteed_default, video_with_targeting, native_with_discount, minimal - test_recipe_round_trip.py — runs the four pre-registered Q2 falsifiers from PR #506: (a) any extra: dict[str, Any] field (b) any # type: ignore needed to construct (c) lossy round-trip dict → recipe → dict extra-forbid: smuggled fields rejected - README.md — what's here, how to run, results, caveats Result: 8 tests pass. All Q2 falsifiers refuse to fire. - Round-trip is lossless across all 5 fixture shapes - Zero Any-typed fields; only dict-typed field is custom_targeting_keys typed strictly as dict[str, str | list[str]] per GAM's API contract, NOT an escape hatch - Direct construction with sub-models needs no # type: ignore - Unknown fields are rejected by extra="forbid" Q2 prior holds: a typed Pydantic recipe carries the full GAM implementation_config shape without escape hatches. Combined with Q1.5 (Phase 1A — recipe is adopter-owned, not framework-managed; corrected in this PR's revision of #502), the architecture story is now: - Recipe is typed at framework boundary (Q2 confirmed) - Recipe storage is adopter-owned (Q1.5 confirmed) - Framework's job: type the contract, route transitions, dispatch Caveats documented in README: - Fixtures derived from service code paths, not production DB dumps; dev-DB validation pass would tighten the result - custom_targeting_keys typing follows GAM's documented API; deeper- nested salesagent data would reject (correct against GAM, may surface migration edges) - Literal[...] enums need versioning when GAM adds enum values Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reference implementation for Phase 2 of the salesagent side-car experiment (PR #506). Lives in adcp-client-python as a doc-and- review-friendly skeleton; deploys into a salesagent fork worktree under src/sdk_runtime/ where it imports salesagent's _impl functions and models. Files: - account_store.py — SalesagentBuyerAgentRegistry + SalesagentAccountStore. Bearer-token (Principal.access_token) lookup; AgentAccountAccess scoping; sandbox→mode projection. fetch_gam_manual_approval_required helper for the HITL gate. - gam_platform.py — GAMPlatform wraps salesagent's _impl functions: _get_products_impl, _create_media_buy_impl, _update_media_buy_impl, _get_media_buy_delivery_impl, _sync_creatives_impl. Builds ResolvedIdentity from SDK ctx. - hitl_gate.py — compose_method before-hook. Wire→GAM-internal operation name mapping (create_media_buy, update_media_buy, sync_creatives→add_creative_assets). Writes WorkflowStep + MediaBuy(raw_request=...) rows via salesagent's existing WorkflowManager; short-circuits with status='pending_approval'. - serve_sidecar.py — adcp.serve(...) entrypoint. Wires platform + HITL gates + auth shim + WebhookSender (SDK→SDK signing per Step 0.6, since salesagent's scheme is incompatible). - README.md — deployment recipe (docker-compose addition, nginx routing config, scheduler patches, tenant configuration, storyboard run commands), exit criteria, open work in scaffold. - __init__.py The salesagent imports are wrapped in try/except so the files lint and import in adcp-client-python's tree without salesagent installed; SALESAGENT_AVAILABLE flag gates runtime behavior. Status: structurally complete reference. Actual storyboard run requires deploying into a salesagent fork worktree with sandbox GAM credentials, configuring the experiment tenant, and running nginx + docker-compose (recipe in README). Local-fork only — no upstream PR to salesagent per the experiment scope constraint. Open gaps documented in README: - _build_resolved_identity may need more fields than the minimal projection here - WorkflowManager constructor signature varies across salesagent versions; pin to a commit - Webhook signing parity is SDK→SDK only (Step 0.6 finding) Recipe shape already validated end-to-end in Phase 1B (examples/recipe_falsification/, 8 tests pass). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Experiment plan that falsifies #502's two-platform composition against salesagent + live GAM. Side-car runtime in a salesagent worktree, points at salesagent's existing Postgres, drives a sandbox GAM Network ID. Admin UI untouched.
Binary exit criterion: the AdCP `media_buy_seller` storyboard passes against the experiment tenant. If it does, the architecture holds and we have a working `ProposalManager` shape that emerged from real adopter code. If it doesn't, we learn precisely where #502 breaks before the architecture solidifies.
In scope (deliberately ambitious)
Out of scope
Other adapters, other tenants, refine flow / proposal lifecycle (`finalize`, `expires_at`), push reporting, creative agent.
Why now / why GAM live
GAM is where ~99% of salesagent's deployed value lives. The dynamic-product assembly, recipe shape, lifecycle state graph, HITL plumbing — all of it is GAM-shaped today. If our architecture survives contact with GAM, it survives. Mock conformance was never going to tell us.
Wrap, don't port
`media_buy_create.py` is 3,930 LOC; `dynamic_products.py` is 505. The discipline is to wrap the existing tool bodies, not re-port them. Isolates the experiment to the seam — where wire requests become salesagent-shaped calls and back. We learn whether the SDK abstractions can host salesagent's existing logic without re-porting it.
Five questions the experiment answers
Refs
Test plan
🤖 Generated with Claude Code