AAO crawler/API: persist managerdomain discovery provenance and reverse index

**Blocked on:** #4173 (merging the managerdomain fallback in `AdAgentsManager`)
**Refs:** #4175 (RFC)

## Context

PR #4173 adds an ads.txt `managerdomain` one-hop fallback to `AdAgentsManager.validateDomain()`. Functionally the AAO crawler picks up delegated publishers transparently — `validation.raw_data` is the manager's manifest, agents and properties get recorded under each publisher. So the bootstrap-6K-publishers value is captured.

But the fallback **discards provenance** at the seam between the validator and everything downstream. That creates concrete operational problems.

## Problem

1. **No reverse index from manager → delegating publishers.** When `manager.example` updates their `adagents.json`, we have no way to know which publishers delegated to them. Today the crawler is per-publisher; without a reverse index, change-propagation is "wait for the next full sweep" — which negates a chunk of the bootstrap value at scale.

2. **`cacheAdagentsManifest(publisherDomain, raw_data)` stores the manager's manifest under the publisher's row** with no marker that it was discovered via `managerdomain`. We can't display "via Raptive" in admin/member surfaces, and we can't audit how authorization was discovered after the fact.

3. **Events** (`publisher.adagents_discovered`, `publisher.adagents_changed`) don't carry discovery method or manager domain — downstream consumers (analytics, member dashboards, audit) can't distinguish direct from delegated.

4. **API surface** (`/api/adagents/validate`, publisher resolution endpoint, the property `source` enum that's currently `adagents_json | agent_claim`) — external consumers will reasonably want to know when authorization is one-hop-via-manager. Same `discovery_method` ask raised on the upstream PR thread.

## Proposed scope

1. Once the upstream `discovery_method` field lands on `AdAgentsValidationResult` (pushed for in #4173):
   - Persist `discovery_method` + `manager_domain` columns on the publisher / adagents cache table.
2. Reverse index: when `discovery_method = 'ads_txt_managerdomain'`, persist a `manager_domain → publisher_domain` mapping. Use it to fan out re-validation when the manager's `adagents.json` changes.
3. Surface `discovery_method` in the `/api/adagents/validate` response and in publisher / property resolution responses. Consider extending the property `source` enum or adding a sub-discriminator.
4. Include `discovery_method` and `manager_domain` in `publisher.adagents_discovered` and `publisher.adagents_changed` events.
5. Optional: `/api/registry/managers/:domain/recrawl` endpoint that invalidates and re-validates all delegating publishers in one shot (rate-limited; mirrors the existing per-publisher recrawl).

## Why this matters now

Without (2), the operational story for "Raptive rotates an agent" is "wait for the next full crawl of every delegated publisher." Without (3) and (4), buyers and members can't distinguish direct attestation from one-hop delegation, which is exactly the trust-tier distinction the upstream RFC depends on.

Triage should pick this up when #4173 merges.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AAO crawler/API: persist managerdomain discovery provenance and reverse index #4200

Context

Problem

Proposed scope

Why this matters now

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AAO crawler/API: persist managerdomain discovery provenance and reverse index #4200

Description

Context

Problem

Proposed scope

Why this matters now

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions