Skip to content

AAO crawler/API: persist managerdomain discovery provenance and reverse index #4200

@bokelley

Description

@bokelley

Blocked on: #4173 (merging the managerdomain fallback in AdAgentsManager)
Refs: #4175 (RFC)

Context

PR #4173 adds an ads.txt managerdomain one-hop fallback to AdAgentsManager.validateDomain(). Functionally the AAO crawler picks up delegated publishers transparently — validation.raw_data is the manager's manifest, agents and properties get recorded under each publisher. So the bootstrap-6K-publishers value is captured.

But the fallback discards provenance at the seam between the validator and everything downstream. That creates concrete operational problems.

Problem

  1. No reverse index from manager → delegating publishers. When manager.example updates their adagents.json, we have no way to know which publishers delegated to them. Today the crawler is per-publisher; without a reverse index, change-propagation is "wait for the next full sweep" — which negates a chunk of the bootstrap value at scale.

  2. cacheAdagentsManifest(publisherDomain, raw_data) stores the manager's manifest under the publisher's row with no marker that it was discovered via managerdomain. We can't display "via Raptive" in admin/member surfaces, and we can't audit how authorization was discovered after the fact.

  3. Events (publisher.adagents_discovered, publisher.adagents_changed) don't carry discovery method or manager domain — downstream consumers (analytics, member dashboards, audit) can't distinguish direct from delegated.

  4. API surface (/api/adagents/validate, publisher resolution endpoint, the property source enum that's currently adagents_json | agent_claim) — external consumers will reasonably want to know when authorization is one-hop-via-manager. Same discovery_method ask raised on the upstream PR thread.

Proposed scope

  1. Once the upstream discovery_method field lands on AdAgentsValidationResult (pushed for in RFC 4175: Add ads.txt managerdomain fallback for adagents lookup #4173):
    • Persist discovery_method + manager_domain columns on the publisher / adagents cache table.
  2. Reverse index: when discovery_method = 'ads_txt_managerdomain', persist a manager_domain → publisher_domain mapping. Use it to fan out re-validation when the manager's adagents.json changes.
  3. Surface discovery_method in the /api/adagents/validate response and in publisher / property resolution responses. Consider extending the property source enum or adding a sub-discriminator.
  4. Include discovery_method and manager_domain in publisher.adagents_discovered and publisher.adagents_changed events.
  5. Optional: /api/registry/managers/:domain/recrawl endpoint that invalidates and re-validates all delegating publishers in one shot (rate-limited; mirrors the existing per-publisher recrawl).

Why this matters now

Without (2), the operational story for "Raptive rotates an agent" is "wait for the next full crawl of every delegated publisher." Without (3) and (4), buyers and members can't distinguish direct attestation from one-hop delegation, which is exactly the trust-tier distinction the upstream RFC depends on.

Triage should pick this up when #4173 merges.

Metadata

Metadata

Assignees

No one assigned

    Labels

    claude-triagedIssue has been triaged by the Claude Code triage routine. Remove to re-triage.enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions