Skip to content

docs(proposals): salesagent side-car experiment plan (full GAM)#506

Closed
bokelley wants to merge 7 commits into
mainfrom
bokelley/salesagent-gam-experiment
Closed

docs(proposals): salesagent side-car experiment plan (full GAM)#506
bokelley wants to merge 7 commits into
mainfrom
bokelley/salesagent-gam-experiment

Conversation

@bokelley
Copy link
Copy Markdown
Contributor

@bokelley bokelley commented May 3, 2026

Summary

Experiment plan that falsifies #502's two-platform composition against salesagent + live GAM. Side-car runtime in a salesagent worktree, points at salesagent's existing Postgres, drives a sandbox GAM Network ID. Admin UI untouched.

Binary exit criterion: the AdCP `media_buy_seller` storyboard passes against the experiment tenant. If it does, the architecture holds and we have a working `ProposalManager` shape that emerged from real adopter code. If it doesn't, we learn precisely where #502 breaks before the architecture solidifies.

In scope (deliberately ambitious)

  • Full GAM upstream — real orders/line items in sandbox. Mock-mode rejected because it would only test wire shape, not whether the SDK can host salesagent's actual upstream complexity.
  • HITL approval lifecycle — `compose_method` + `ShortCircuit` against salesagent's actual `WorkflowStep` flow. Grounded in real code (`google_ad_manager.py:571` gate, `media_buy_create.py:458` resumption). Salesagent's `WorkflowStep` table IS its task registry; SDK `TaskRegistry` doesn't participate.
  • Webhook delivery via SDK F12 auto-emit — `protocol_webhook_service.py` disabled per-tenant; `WebhookSender` configured on `serve(...)`. Validates the §3.14 claim that adopters delete their webhook plumbing.

Out of scope

Other adapters, other tenants, refine flow / proposal lifecycle (`finalize`, `expires_at`), push reporting, creative agent.

Why now / why GAM live

GAM is where ~99% of salesagent's deployed value lives. The dynamic-product assembly, recipe shape, lifecycle state graph, HITL plumbing — all of it is GAM-shaped today. If our architecture survives contact with GAM, it survives. Mock conformance was never going to tell us.

Wrap, don't port

`media_buy_create.py` is 3,930 LOC; `dynamic_products.py` is 505. The discipline is to wrap the existing tool bodies, not re-port them. Isolates the experiment to the seam — where wire requests become salesagent-shaped calls and back. We learn whether the SDK abstractions can host salesagent's existing logic without re-porting it.

Five questions the experiment answers

  1. Does `dynamic_products.py` factor onto `ProposalManager.get_products` (target: <300 LOC of glue)?
  2. Does the recipe carry GAM's `implementation_config` without escape hatches?
  3. What hydration model does `create_media_buy` need (session cache, DB row, fresh lookup)?
  4. What's the right shape for the HITL resumption marker — typed `ctx.resumption_token` vs. adopter-side sentinel?
  5. Does F12 webhook auto-emit hold up under real load without adapter participation?

Refs

  • #502 — the architecture this experiment falsifies
  • #489 — the migration narrative this validates against real code
  • `prebid/salesagent` — the adopter codebase the experiment runs inside

Test plan

  • Reviewers sanity-check the GAM-vs-mock decision and the wrap-don't-port discipline
  • Reviewers flag any storyboard surface that exercises a deferred feature (refine, push reporting) on the happy path
  • Reviewers confirm the salesagent file:line citations for HITL are accurate and the SDK-mapping reads cleanly
  • Decide whether to start the experiment immediately or wait for adversarial review on this plan first

🤖 Generated with Claude Code

Falsifies #502's two-platform composition against salesagent + live
GAM. Side-car runtime in a salesagent worktree, same DB, untouched
admin UI; binary exit criterion is the AdCP media_buy_seller
storyboard passing against a sandbox GAM Network ID.

In scope: real GAM upstream (wrap, don't port), HITL approval
lifecycle (compose_method + ShortCircuit against salesagent's actual
WorkflowStep flow), webhook delivery via SDK F12 auto-emit. The HITL
mapping is grounded in salesagent's existing gate at
google_ad_manager.py:571 + execute_approved_media_buy at
media_buy_create.py:458 — no paused-task resumption needed; the
SDK's TaskRegistry doesn't participate.

Five concrete questions the experiment answers, including the
resumption-marker shape (typed ctx.resumption_token vs. adopter-side
sentinel) and whether dynamic_products.py factors onto
ProposalManager.get_products without gutting it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bokelley
Copy link
Copy Markdown
Contributor Author

bokelley commented May 3, 2026

Read through and verified the salesagent citations against current main (file:line refs all check out, LOC counts match). Here's pushback on execution risks and a few suggested changes.

Gotchas

1. Wrap target is wrong. The plan wraps google_ad_manager.py. But media_buy_create.py (3,930 LOC) does a lot of non-GAM work — buyer/principal resolution, tenant config, currency validation, signal lookup, audit logs, workflow rows, webhook scheduling. Salesagent already established _impl functions as the transport-agnostic seam (CLAUDE.md Pattern #5). The natural wrap is _create_media_buy_impl, not the adapter. Wrapping the adapter forces the side-car to re-implement everything else from scratch — that's a port disguised as a wrap, and the "<300 LOC of glue" target evaporates instantly.

2. "Same process, two runtimes" is the bigger of two unspecified shims. "Tiny dispatcher consults Principal.tenant_id" hides the real problem: both runtimes own MCP+A2A transport mounts, FastMCP tool registries, identity resolution, and (per salesagent's structural guards) ResolvedIdentity construction. Routing has to happen before either runtime's transport layer claims the request. Separate process on a separate port with nginx-level routing by tenant header is cleaner, easier to delete, and avoids dual-import-time conflicts. The plan rules this out implicitly without justifying it.

3. ~25 structural guards will fire on src/sdk_runtime/. Salesagent has AST-enforced guards on make quality: no raw select() outside repos, no get_db_session() in _impl, transport-agnostic _impl, schema inheritance from adcp library, repository pattern. Either the side-car lives outside src/ (and is therefore not really inside salesagent) or every guard needs an allowlist entry. Workstream needs a step 0 for this.

4. Two writers, one DB is not fine just because it's a test tenant.

  • SQLAlchemy models defined in both salesagent (models.py, 2,113 LOC) and adcp-client-python — cascade/relationship semantics must agree exactly.
  • Single alembic head is a structural guard; SDK runtime can't bring its own migrations.
  • Workflow rows written by the SDK's before hook must be re-readable by admin UI's approve_workflow_step and re-callable into the SDK runtime — a cross-runtime contract on a shared row, not a tenant-isolated boundary.

5. The HITL marker decision is N=1, and salesagent's pattern is unrepresentative. Salesagent uses DB-persisted re-call with a sentinel, not paused-coroutine resumption. The experiment can answer "does the SDK seam accommodate salesagent's shape" — it cannot answer "what's the right Protocol seam." Risk: typed ctx.resumption_token gets baked into the spec because it felt natural in one codebase, then the next adopter with TaskRegistry-style coroutine resumption finds it awkward. Decouple this — it's a Protocol design call informed by, not settled inside, the experiment.

6. _already_approved sentinel via setattr on a Pydantic model. With extra=\"forbid\" in dev (salesagent Pattern #7), worth confirming this still works under the SDK's request projection — the new runtime presumably re-validates, which would strip the sentinel. Test before relying on it.

7. Webhook compatibility test is shallow. Disabling protocol_webhook_service.py and turning on WebhookSender tells you whether something fires. It doesn't tell you whether existing buyers, who trust salesagent's HMAC scheme via webhook_authenticator.py, will accept signatures from SDK F12. Risks signature scheme drift being declared "validated" when it wasn't tested against a real subscribed buyer.

8. The "<300 LOC glue" target for GAMProposalManager is unfounded. dynamic_products.py is 505 LOC plus AI services in src/services/ai/. The wrapper must project request, hydrate Product rows, pass to assembly, project response with implementation_config. No basis for the 300 number; will feel like a moving goalpost when reality lands at 700.

9. Single exit criterion lets the experiment lie. "Storyboard passes" is satisfiable with wrappers that grew enormous. Need a co-equal budget criterion: no extra: dict[str, Any] escape hatches in recipe; combined glue under N LOC; zero structural-guard allowlist additions. Otherwise the architectural hypothesis is unfalsifiable even when the storyboard "passes."

10. Pin SHAs in step 0. adcp-client-python from main will rebase. Storyboard version isn't pinned. GAM sandbox network config isn't pinned. Step 1 should be "freeze: adcp-client-python@<sha>, storyboard <tag>, GAM Network ID <id> in experiment README."

Better way (smallest changes I'd push for)

  1. Wrap _impl not adapter. The seam already exists.
  2. Separate process + tenant-header routing at nginx, not in-process dispatch. Killable. Independently restartable. No model/import collisions.
  3. Add a budget exit criterion alongside "storyboard passes." Glue LOC, recipe escape hatches, guard-allowlist deltas.
  4. Lift HITL marker decision out of the experiment. Run it as a Protocol design RFC informed by the experiment, not settled by it.
  5. Step 0: pin SDK SHA, storyboard tag, GAM sandbox config; resolve structural-guard story for src/sdk_runtime/ (or move it).
  6. Webhook test against a real subscribed buyer, not just "did something fire." Otherwise §3.14 isn't validated.

The instincts are right (sidecar, wrap-don't-port, GAM-live, binary criterion). Execution risks: wrap target is too low in the stack, in-process routing is harder than billed, and the architectural hypothesis is over-coupled to one adopter's HITL idiom.

@bokelley
Copy link
Copy Markdown
Contributor Author

bokelley commented May 3, 2026

Plan is well-structured and the salesagent file:line citations check out (spot-checked google_ad_manager.py:570-571, media_buy_create.py:458 + literal setattr(request, "_already_approved", True) at line 529, models.py:256 for implementation_config, admin/blueprints/workflows.py:155 for approve_workflow_step calling into execute_approved_media_buy). The "wrap don't port" discipline and the five forced-choice questions are the right framing.

The headline design risk: the experiment quietly conflates "does the SDK shape host salesagent's logic" with "is the SDK shape correct." A wrap that ports cleanly proves the seam is flexible enough, not that it's right — especially on the proposal side, where I control the wrap, the seam, and the success criterion.

Gotchas / what won't work cleanly

  1. _already_approved is one of three approval re-entry surfaces, not one. The plan focuses on create_media_buy, but salesagent gates add_creative_assets and update_media_buy too, and creative approval has its own re-entry path through order_approval_service.py and background_approval_service.py. If compose_method only models the create path cleanly, the experiment passes a happy-path storyboard and still misses ⅔ of the actual seam. Either exercise creative approval too, or scope it out explicitly.

  2. raw_request on MediaBuy is the resumption substrate, and it lives in the salesagent DB. The plan acknowledges salesagent's WorkflowStep table is its task registry, but raw_request is the payload — and the SDK has no concept of it. The "typed ctx.resumption_token" option only works if the SDK can carry the equivalent of raw_request through dispatch, or if the adopter rebuilds the request before calling back in. The plan is hand-wavy here ("attach a resumption marker") and this is exactly where shape decisions get made.

  3. implementation_config may be too late-bound for the recipe model in docs(proposals): product architecture — layered model + two-platform composition #502. PR docs(proposals): product architecture — layered model + two-platform composition #502 says the recipe "lives in framework session cache against proposal_id" during refine, then "framework persists it" on finalize. Salesagent's implementation_config is on the Product row (models.py:256), not on the proposal. Dynamic products generate signal-driven variant Product rows at brief time (dynamic_products.py) — those are new Product rows the framework doesn't know about. The recipe-as-framework-managed-state model probably has to allow proposal-time recipe assembly, not just lookup. Worth surfacing as Q1.5.

  4. F12 webhook signature parity is a real risk, not a small one. Salesagent has its own webhook_authenticator.py and webhook_verification.py. If signing semantics diverge (header names, canonicalization, key rotation), the experiment hits non-determinism that's not a SDK design issue — it'll burn cycles. Do the signing dry-run on day 1, before the storyboard. Surface it as workstream step 0.

  5. "<300 LOC of glue" for dynamic_products.py is a vibe number, not a constraint. That file already calls a SignalsAgentRegistry, hits Postgres, mutates Product rows with TTLs (expires_at), hashes for dedup. If the wrap hits 600 LOC, what does the experiment conclude? Set the threshold so it can actually fail.

  6. "Two writers, one DB — controlled" is fine if the experiment tenant is genuinely isolated. Salesagent's background services (background_sync_service, background_approval_service, delivery_webhook_scheduler, protocol_webhook_service) run cross-tenant. Disabling per-tenant is doable but easy to miss. Enumerate the disable list in the plan, not a footnote.

  7. The media_buy_seller storyboard is a moving target. Pin the commit SHA, not just the version — storyboards have changed materially across 6.7→6.9 and there's more in flight for 6.10.

  8. The "binary exit criterion" is too binary. A passing storyboard against a hand-wrapped adapter, where I control the wrap shape, the seam, and the success criterion, isn't falsifying much. More honest exit: storyboard passes AND at least one of the five questions has an answer that contradicts a docs(proposals): product architecture — layered model + two-platform composition #502 prior. Otherwise the experiment will always confirm.

What to be careful about

  • One author wearing three hats: SDK author, adopter (salesagent), and reviewer. The "force a choice between typed vs untyped resumption marker" framing is only honest if I commit upfront to which signal would tell me each shape is wrong. Otherwise it's tempting to pick whichever felt natural after the fact and call it "what the experiment showed."
  • Don't let the experiment ship the ProposalManager Protocol. Next-steps says factor it in a separate PR — hold that line. Running in a worktree against forked main is qualitatively different from declaring a Protocol shape other adopters inherit. The experiment is one data point, not the spec.
  • PR docs: salesagent → adcp Python SDK migration guide #489 §3.3 is already partially baked. It maps HITL to compose_method + ShortCircuit. If the experiment finds compose_method is the wrong primitive, docs: salesagent → adcp Python SDK migration guide #489 has to revise too — name that in the exit criteria so it doesn't get forgotten.

Is there a better way

Two cheaper falsifiers worth running first:

  1. Falsify on dynamic_products.py in isolation, before standing up the side-car. Port just that 505-LOC file to a ProposalManager.get_products wrapper as a unit — no GAM, no admin UI, no DB rewiring. ~1 day. If the recipe shape can't carry signal-driven variants without escape hatches, Q1+Q2 are answered without touching the runtime, GAM credentials, or webhook signing. This is the cheapest available falsification of docs(proposals): product architecture — layered model + two-platform composition #502, and it gates whether the side-car experiment is worth running at all.

  2. Skip the live runtime; run as a contract test. Drive ProposalManager + DecisioningPlatform against a recorded dynamic_products output and a recorded GAM SOAP transcript. Lose live-GAM realism, gain repeatability + CI integration + the ability to A/B multiple SDK shapes. The current plan is one-shot — once we wire it up we won't run it again as the SDK evolves.

(I considered "do Kevel first, not GAM" but Kevel isn't actually wired to any real tenants today, so it doesn't give us a falsifier with adopter-shape complexity. GAM is the only live target with the surface to be informative — which is also why it's the only one that can really break.)

Three concrete adds before merging the proposal doc

  1. Make the exit criterion non-tautological — require a contradicting finding on ≥1 of the five questions, not just a passing storyboard.
  2. Either scope creative-approval and update_media_buy re-entry in, or scope them out explicitly. Right now they're invisible.
  3. Run the dynamic_products falsification (option 1 above) first. It's the cheapest experiment that can actually kill docs(proposals): product architecture — layered model + two-platform composition #502's recipe model, and it gates whether the side-car experiment is worth doing.

@bokelley
Copy link
Copy Markdown
Contributor Author

bokelley commented May 3, 2026

Thanks for the detailed self-review — the concerns are well-founded and several are blockers for the plan's validity. Since this is your branch I can't push fixup commits directly; flagging what should change before the plan is acted on.

Blockers (plan is misleading or wrong without addressing these):

#1 — Wrap target. Agreed. The plan says "wraps salesagent's existing google_ad_manager.py adapter" but the _impl seam you're pointing at (_create_media_buy_impl) is the right isolation point. Wrapping the full adapter without first landing the _impl split in salesagent effectively makes this a port, not a wrap — and the "<300 LOC of glue" target becomes meaningless. Workstream step 0 should be: identify (or introduce) the transport-agnostic _impl functions that become the actual wrap targets before writing any SDK runtime code.

#2 — Dual transport mounts. The plan doesn't address how routing happens before FastMCP and A2A transport layers claim the request. "Tiny dispatcher consults Principal.tenant_id" buries the hard part. The plan should either (a) specify the exact hook point (ASGI middleware before any framework router? a request-ID header the dispatcher reads at process boundary?), or (b) explicitly choose the separate-process option and justify skipping it. Right now it implicitly rules out separate-process without saying why.

#3 — Structural guards. A "step 0" item is missing: enumerate which of salesagent's AST guards fire on src/sdk_runtime/ code and whether allowlist entries or location changes are needed. This is a concrete prereq, not a risk to manage mid-experiment.

#4 — Cross-runtime execute_approved_media_buy contract. The plan correctly describes the HITL re-entry shape but underplays the shared-row dependency. When execute_approved_media_buy reconstructs the request from raw_request and calls back into the SDK runtime, it's reading a row written by the SDK's before hook — that's a cross-runtime contract on a shared schema row. The "controlled" risk framing is fine for the test tenant, but the plan should acknowledge this explicitly so reviewers don't assume it's isolated.

Should be updated before acting on the plan:

  • feat: complete Python AdCP SDK with typed API and auto-generated types #5 (HITL marker N=1): Add an explicit note to the findings-doc step: the experiment informs but does not settle the Protocol seam for ctx.resumption_token. The spec decision belongs in a separate doc after the experiment — not inherited from what felt natural in salesagent.

  • revert: restore correct PYPY_API_TOKEN secret name #7 (Webhook test shallowness): Workstream step 6 should add "verify against a subscribed test buyer that validates salesagent's existing HMAC scheme" — otherwise the test only confirms something fires, not that buyer signature verification passes.

  • fix: use correct PYPY_API_TOKEN secret for PyPI publishing #8 (<300 LOC target): Replace the absolute target with a ratio goal (e.g., "glue LOC ≤ 60% of dynamic_products.py body") or drop the number and just instrument it. The current target has no basis given the AI services in src/services/ai/.

Lower priority (valid, can be tracked as risks):

  • fix: correct PyPI API token secret name #6 (_already_approved with extra="forbid"): Worth adding to Risks as a concrete item: "Validate that setattr(request, '_already_approved', True) survives SDK re-validation under extra='forbid'; if not, prototype typed marker first."

  • chore(main): release 1.0.1 #9 (truncated): Didn't receive the full text — if the concern is that a binary storyboard-passes criterion conflates "architecture works" with "we worked around it," that's worth a sentence in the exit criteria section: "Passes counts only if the wraps stayed thin (no inline re-porting). Document any case where the wrap grew into a port as a failure mode even if the storyboard passed."

I'd suggest revising the plan to incorporate at least the four blockers above before starting workstream step 1.


Claude Code — cannot push to non-claude/* branches; flagging for your update.


Generated by Claude Code

@bokelley
Copy link
Copy Markdown
Contributor Author

bokelley commented May 3, 2026

Good second take — it's complementary to the first review (execution risks) and adds the epistemological layer: can the experiment falsify anything it doesn't already control?

What's genuinely new vs. what was already flagged:

  • Q1.5 on implementation_config (point 3): This is new and worth adding as an explicit question. The prior response flagged the cross-runtime contract on shared rows; this sharpens a different edge — docs(proposals): product architecture — layered model + two-platform composition #502's recipe is assumed to live on the proposal (session cache → finalize → persist), but salesagent's implementation_config lives on the Product row, and signal-driven dynamic_products.py generates new Product rows at brief time that the framework doesn't track. If "framework persists the recipe" assumes lookup-by-proposal-id, it breaks at dynamic-product assembly time. That's worth naming as Q1.5 in "What we expect to learn."

  • raw_request as payload substrate (point 2): The plan's HITL section says "attach a resumption marker" without grappling with the fact that execute_approved_media_buy reconstructs the full request from raw_request — a DB-persisted JSON blob, not a token. A typed ctx.resumption_token doesn't carry that. Either the adopter rebuilds the request independently before calling back in (from what source?), or the SDK carries more than a marker. The plan is hand-wavy exactly where the shape decision lands.

The three concrete asks are right. The prior response covered the execution blockers; these are the epistemological ones. The "contradicting finding required" framing for the exit criterion is the sharpest version — a storyboard that passes without challenging any #502 prior is unfalsifying by construction, regardless of whether the wraps stayed thin.

I can't push to bokelley/salesagent-gam-experiment, so these are yours to apply. Together the two review threads make a complete revision checklist.


Claude Code — PR-feedback mode, non-claude/* branch.


Generated by Claude Code

bokelley and others added 6 commits May 3, 2026 19:55
Major changes addressing self-review on PR #506:

1. Two-phase structure. Phase 1 (~1 day): falsify recipe model on
   dynamic_products.py in isolation, no infrastructure. Phase 2
   (3-5 days, gated on Phase 1): full side-car runtime with GAM
   live + HITL + webhooks. The cheapest available falsifier of #502
   runs first; full experiment only if Phase 1 finds the recipe
   shape can carry signal-driven variants without escape hatches.

2. Wrap target corrected. Wrap salesagent's _impl seams
   (_create_media_buy_impl, _update_media_buy_impl,
   _add_creative_assets_impl, _get_products_impl), NOT adapter
   bodies. Self-review correctly flagged that wrapping
   google_ad_manager.py forces re-implementing principal
   resolution, tenant config, currency validation, signal lookup,
   audit logs — a port disguised as a wrap.

3. Routing strategy: separate process + nginx tenant-header
   routing, NOT in-process dispatch. Avoids dual MCP+A2A transport
   mounts, FastMCP registry collisions, ResolvedIdentity
   construction conflicts. Killable, independently restartable.

4. Step 0 with concrete prereqs: pin SHAs (SDK, both storyboards,
   GAM Network ID); identify _impl seams; enumerate AST guards
   that fire on src/sdk_runtime/; enumerate cross-tenant
   background services to disable; validate _already_approved
   sentinel under extra='forbid'; verify webhook signing parity
   against subscribed test buyer; pre-register falsification
   signals.

5. Multi-criteria exit (not binary): both storyboards pass,
   recipe carries implementation_config without extra: dict
   escape hatches, glue LOC under 60% ratio, zero structural-guard
   allowlist additions, at least one finding contradicts a #502
   prior, webhook signature verified by real subscribed buyer.

6. HITL marker decision decoupled. Salesagent's pattern is N=1
   and unrepresentative; experiment informs but doesn't settle
   the Protocol seam. Spec PR comes after as separate work.

7. Q1.5 added: does recipe model allow proposal-time assembly,
   not just lookup? implementation_config lives on Product row;
   dynamic products generate signal-driven variants at brief time.
   #502's session-cache model may be too restrictive.

8. Three HITL re-entry surfaces in scope (create/update/
   add_creative_assets). Creative-specific re-entry through
   order_approval_service.py out of scope for v1.

9. Both storyboards pinned: media_buy_seller for happy path +
   media_buy_guaranteed_approval for HITL. Test-controller hybrid
   mode for delivery simulation alongside real GAM mutations.

10. Auth shim sized correctly per schema audit (~150 LOC total —
    Principal is bearer-token only, Account is already AdCP-shaped).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Salesagent's multi-adapter abstraction is vestigial: GAM is the only
deployed backend (~99% of clients per migration guide §"Migration
order"); Kevel/Broadstreet/Triton/Xandr are scaffolding with no
client traffic; mock is a fixture. Treating salesagent as a GAM
agent that ships dead code simplifies the experiment in three
concrete ways:

1. Wrap target is unconditional GAM. The if-adapter-class switches
   in _impl (e.g., media_buy_create.py:2431-2464) collapse to
   unconditional logic; no compatibility surface to preserve.
2. Single recipe type — salesagent contributes only the GAM shape
   to #502's typed-recipe model. Phase 1 falsification narrows.
3. MockAdServer (~1,800 LOC) deletion joins post-experiment cleanup.

Updates:
- New "Reframing" section after Two phases
- Out of scope clarified: other adapters slated for deletion, not
  preserved
- Next steps adds adapter deprecation roadmap (~3,500-4,000 LOC
  deletion across 4 sequenced PRs) and side-car-to-runtime
  promotion path
- Note that #489 §3.1 needs a "single-adapter adopters skip
  PlatformRouter" addition (tracked separately)

Doesn't change experiment shape: two-platform composition seam,
recipe falsification target, HITL/webhook/auth shim work all stay.
Reframing simplifies the success path; doesn't shrink the questions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three Step 0 investigations completed; results updated in the doc.

0.2 — _impl seams (CONFIRMED CLEAN)
- All four core _impl functions exist transport-agnostic in
  src/core/tools/. File:line refs:
  _create_media_buy_impl (media_buy_create.py:1270, async)
  _update_media_buy_impl (media_buy_update.py:117, sync)
  _get_products_impl (products.py:145, async)
  _get_media_buy_delivery_impl (media_buy_delivery.py:67, sync)
  _sync_creatives_impl (creatives/_sync.py:29, sync)
- _add_creative_assets_impl doesn't exist; HITL gate uses GAM-
  internal name mapping to sync_creatives. Before-hook needs a
  small operation-name table.
- Zero salesagent-side refactor required before Phase 2.

0.3 — Structural guards (MUCH SMALLER THAN FEARED)
- ~25 total guards via make quality + .pre-commit-hooks/
- Most are path-filtered out of src/sdk_runtime/ (tests-only,
  alembic-only, models.py-only, tools-only, schemas.py/main.py-only)
- Guards that apply to sdk_runtime/: ~5-7 hygiene-level
  (sqlalchemy 2.0, no hasattr root, no .fn(), import usage, type
  ignore count, code duplication). Trivially satisfiable.
- Zero allowlist additions likely needed; possibly one if
  check_parameter_alignment.py doesn't accept a third _impl caller.
- Recommendation: keep src/sdk_runtime/ inside src/.

0.4 — Cross-tenant background services (TWO, NOT FOUR)
- protocol_webhook_service, background_approval_service,
  order_approval_service, background_sync_service all fire per-
  request or per-order; do NOT need per-tenant disable.
- Two genuinely cross-tenant schedulers from core/main.py lifespan:
  - media_buy_status_scheduler.py (60s cadence, lifecycle
    transitions cross-tenant)
  - delivery_webhook_scheduler.py (daily, sends reporting_webhook
    cross-tenant)
- Both would race/duplicate against side-car runtime.
- Concrete prereq added: Tenant.skip_legacy_schedulers: bool
  alembic migration + 6-line consults in both schedulers.

Updates also flow through:
- HITL section: sync_creatives is the wire surface, not
  add_creative_assets; operation-name mapping called out
- Workstream Phase 2: wraps now include _sync_creatives_impl;
  webhook step notes per-tenant scheduler disable, not service
  disable
- Risks: three HITL surfaces with sync_creatives correction

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three more Step 0 investigations completed; doc updated.

(1) check_parameter_alignment.py analyzed
- Guard enumerates pairs of (mcp_wrapper, a2a_raw) from a hardcoded
  tools list (lines 36-78). Does NOT enumerate "all callers of
  _impl." Side-car's GAMDecisioningPlatform calling _impl directly
  is invisible to the guard.
- Confirmed: zero allowlist additions needed.

(2) _already_approved sentinel works as-is
- compose_method (compose.py:173-194) passes req through unchanged
  from before-hook to inner; no model_validate/model_copy/model_dump
  on the request side.
- Dispatcher only model_dumps on response side (dispatch.py:1234,
  1306-1307).
- Generated request models use extra='forbid' (validation-time only)
  without frozen=True or validate_assignment=True; setattr lands in
  __dict__ and persists through Python-level dispatch.
- No typed marker prototype needed for this experiment. Q4 design
  question stays open as a Protocol RFC.

(3) Webhook signing parity does NOT hold (real finding)
- Salesagent's webhook_authenticator.py emits X-Webhook-Signature +
  X-Webhook-Timestamp with payload f"{ts}.{json.dumps(payload,
  separators=(',',':'), sort_keys=True)}" canonicalization.
- SDK's from_adcp_legacy_hmac emits X-AdCP-Signature +
  X-AdCP-Timestamp + X-AdCP-Key-Id with different canonicalization.
- Different headers, different canonicalization, different scheme
  entirely.
- §3.14's "adopters delete their webhook plumbing wholesale" claim
  doesn't hold cleanly — production cutover requires buyer migration.
- Experiment validates SDK→SDK signing with adcp.WebhookReceiver as
  test buyer. Does not validate the migration claim against actual
  subscribed buyers.

Constraint added to Step 0:
- Local fork edits only. No upstream PRs to salesagent. The
  scheduler skips become hardcoded constants or env-var consults
  in our experiment fork; revert with `git checkout main`.

Updates flow through:
- Step 0.4: prereq is local fork patch, not alembic migration
- Step 0 Investigated section: three more ✅ items + one ⚠️
- Workstream 2.4: webhook test buyer is adcp.WebhookReceiver (not
  a real subscribed buyer; that's out of scope post-experiment)
- Risks: webhook signing parity bullet rewritten with the correct
  finding (incompatible, not "verify before run")
- Next steps adds 3a — correct §3.14 of #489 migration guide

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
For each of the six learning questions (Q1, Q1.5, Q2, Q3, Q4, Q5),
write down the specific finding that would falsify the prior —
before the experiment runs. This is the enforcement mechanism for
self-review's "one author wearing three hats" warning: if I don't
commit upfront to what would tell me I'm wrong, I'll find what
I'm looking for.

Concrete, observable falsifiers per question:
- Q1: glue >303 LOC, monkey-patching needed, identity impedance
- Q1.5: recipe schema requires proposal_id, variant Products need
  forged rows, hash-dedup state crosses sessions
- Q2: any extra: dict, type: ignore on recipe construction, lossy
  round-trip
- Q3: none of the three hydration models work, OR adopter-owned
  hydration turns out to be the right primitive
- Q4: salesagent pattern is N=1; experiment informs but doesn't
  settle the Protocol seam
- Q5: SDK→SDK signing parity already partially falsified; remaining
  falsifiers are auto-emit doesn't fire, retry doesn't behave

Step 0.7 marked ✅ in workstream.

The experiment is now fully unblocked from a planning standpoint.
Phase 1 can start; Phase 2 prereqs are mechanical (pin SHAs, local
fork patch for two schedulers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Read dynamic_products.py end-to-end (505 LOC, src/services/
dynamic_products.py). Two of three pre-registered Q1.5 falsifiers
fire from reading alone, no harness needed.

Five structural facts:
1. Variants are persistent DB rows in the products table — not
   session-scoped state
2. Variants share the template's implementation_config verbatim
   (line 303); recipe shape doesn't differ
3. Variant identity is globally deterministic via md5 hash of
   activation_key — dedup crosses sessions/buyers
4. Signal-derived data lives on the Product row (signal_metadata,
   activation_key, parent_product_id, expires_at,
   is_dynamic_variant), NOT in implementation_config
5. Variants have an independent lifecycle (TTL + archival) with no
   relationship to proposal acceptance/finalization

Falsifiers fired (Q1.5):
- "Variant Products require new schema rows" — confirmed
- "Hash-dedup state crosses sessions" — confirmed
- "Recipe schema requires proposal_id lookup" — not strictly fired,
  but related: recipe is Product-scoped not proposal-scoped, so
  #502's framework-managed-recipe-state model is wrong shape

Implication for #502:
- Recipe is adopter-owned data on the Product row (or equivalent)
- Framework's job at the seam is TYPING the recipe contract, not
  CACHING it
- Proposal-time assembly (generating new Product rows that share a
  template's recipe) is adopter logic; framework shouldn't try to
  cache "proposal recipes" because proposals don't own them

Exit criterion (5) — at least one finding contradicts a #502 prior
— satisfied early, pre-registered, before 1B harness runs.

Q1 prediction (still pre-1B): wrapper is small (~50-80 LOC) for the
dynamic-products subset. Wrapper sketch included in doc.

Q2 still pending — needs 1B run projecting actual
implementation_config through a typed Pydantic recipe.

Phase 1B harness setup documented (worktree, fixtures,
signals_agent_registry mock, wrapper module, typed GAMRecipe model,
test harness). ~2 hours in a salesagent worktree. Next session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bokelley added a commit that referenced this pull request May 4, 2026
Adds examples/recipe_falsification/ — the pytest harness for Q2 of
the salesagent side-car experiment (PR #506):

> Does the recipe shape carry GAM's implementation_config without
> escape hatches?

Files:
- gam_recipe.py — typed Pydantic GAMRecipe model with sub-models
  (CreativePlaceholder, FrequencyCap), Literal-typed enums for
  every documented GAM API value, extra="forbid" on every model
- fixtures/gam_impl_config_examples.json — five fixture shapes
  derived from salesagent's GAMProductConfigService:
    guaranteed_default, non_guaranteed_default,
    video_with_targeting, native_with_discount, minimal
- test_recipe_round_trip.py — runs the four pre-registered Q2
  falsifiers from PR #506:
    (a) any extra: dict[str, Any] field
    (b) any # type: ignore needed to construct
    (c) lossy round-trip dict → recipe → dict
    extra-forbid: smuggled fields rejected
- README.md — what's here, how to run, results, caveats

Result: 8 tests pass. All Q2 falsifiers refuse to fire.

- Round-trip is lossless across all 5 fixture shapes
- Zero Any-typed fields; only dict-typed field is custom_targeting_keys
  typed strictly as dict[str, str | list[str]] per GAM's API contract,
  NOT an escape hatch
- Direct construction with sub-models needs no # type: ignore
- Unknown fields are rejected by extra="forbid"

Q2 prior holds: a typed Pydantic recipe carries the full GAM
implementation_config shape without escape hatches.

Combined with Q1.5 (Phase 1A — recipe is adopter-owned, not
framework-managed; corrected in this PR's revision of #502), the
architecture story is now:
- Recipe is typed at framework boundary (Q2 confirmed)
- Recipe storage is adopter-owned (Q1.5 confirmed)
- Framework's job: type the contract, route transitions, dispatch

Caveats documented in README:
- Fixtures derived from service code paths, not production DB dumps;
  dev-DB validation pass would tighten the result
- custom_targeting_keys typing follows GAM's documented API; deeper-
  nested salesagent data would reject (correct against GAM, may
  surface migration edges)
- Literal[...] enums need versioning when GAM adds enum values

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bokelley added a commit that referenced this pull request May 4, 2026
Reference implementation for Phase 2 of the salesagent side-car
experiment (PR #506). Lives in adcp-client-python as a doc-and-
review-friendly skeleton; deploys into a salesagent fork worktree
under src/sdk_runtime/ where it imports salesagent's _impl
functions and models.

Files:
- account_store.py — SalesagentBuyerAgentRegistry +
  SalesagentAccountStore. Bearer-token (Principal.access_token)
  lookup; AgentAccountAccess scoping; sandbox→mode projection.
  fetch_gam_manual_approval_required helper for the HITL gate.
- gam_platform.py — GAMPlatform wraps salesagent's _impl
  functions: _get_products_impl, _create_media_buy_impl,
  _update_media_buy_impl, _get_media_buy_delivery_impl,
  _sync_creatives_impl. Builds ResolvedIdentity from SDK ctx.
- hitl_gate.py — compose_method before-hook. Wire→GAM-internal
  operation name mapping (create_media_buy, update_media_buy,
  sync_creatives→add_creative_assets). Writes WorkflowStep +
  MediaBuy(raw_request=...) rows via salesagent's existing
  WorkflowManager; short-circuits with status='pending_approval'.
- serve_sidecar.py — adcp.serve(...) entrypoint. Wires platform +
  HITL gates + auth shim + WebhookSender (SDK→SDK signing per
  Step 0.6, since salesagent's scheme is incompatible).
- README.md — deployment recipe (docker-compose addition, nginx
  routing config, scheduler patches, tenant configuration,
  storyboard run commands), exit criteria, open work in scaffold.
- __init__.py

The salesagent imports are wrapped in try/except so the files
lint and import in adcp-client-python's tree without salesagent
installed; SALESAGENT_AVAILABLE flag gates runtime behavior.

Status: structurally complete reference. Actual storyboard run
requires deploying into a salesagent fork worktree with sandbox
GAM credentials, configuring the experiment tenant, and running
nginx + docker-compose (recipe in README). Local-fork only — no
upstream PR to salesagent per the experiment scope constraint.

Open gaps documented in README:
- _build_resolved_identity may need more fields than the minimal
  projection here
- WorkflowManager constructor signature varies across salesagent
  versions; pin to a commit
- Webhook signing parity is SDK→SDK only (Step 0.6 finding)

Recipe shape already validated end-to-end in Phase 1B
(examples/recipe_falsification/, 8 tests pass).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bokelley bokelley closed this May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant