Skip to content

Add provider-normalized context views#16

Closed
fivetran-davefowler wants to merge 4 commits into
mainfrom
agent_schema_views
Closed

Add provider-normalized context views#16
fivetran-davefowler wants to merge 4 commits into
mainfrom
agent_schema_views

Conversation

@fivetran-davefowler

@fivetran-davefowler fivetran-davefowler commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Summary

Introduces an INFORMATION_SCHEMA-compatible context-view convention for Agents Schema.

The core views are:

  • AGENTS.SCHEMATA
  • AGENTS.TABLES
  • AGENTS.COLUMNS

Each one uses the matching INFORMATION_SCHEMA view as its row spine via SELECT t.*, then joins provider-normalized relations by object identity. That makes these views familiar to agents and existing SQL snippets that already know INFORMATION_SCHEMA, while adding provider context in prefixed columns.

Why

This adds a small amount of convention to the spec: providers that want generic integration can expose schema-, table-, and column-grain relations with predictable suffixes. The tradeoff is worth it for three reasons:

  1. AI adoption: agents already have strong priors around INFORMATION_SCHEMA. Teaching them to use AGENTS.SCHEMATA, AGENTS.TABLES, and AGENTS.COLUMNS instead of INFORMATION_SCHEMA.SCHEMATA, INFORMATION_SCHEMA.TABLES, and INFORMATION_SCHEMA.COLUMNS when available is a small mapping, not a new navigation model. That should make Agents Schema easier for agents to use and faster for teams to adopt.

  2. Provider integration clarity: providers get a clear convention for joining the common surface without giving up their own source tables or implementation details. They can publish normalized relations at the schema, table, or column grain and let the generic views handle discovery, prefixing, and object-identity joins.

  3. Adopter utility: assuming the main use case is schema-based warehouse work, most questions start with schemas, tables, and columns: what exists, what matters, what a field means, and what context different tools have attached. These views make the common path useful immediately while still letting provider-specific detail live in provider-owned relations.

Provider convention

Providers can participate by publishing one or more normalized relations, as tables or views, named:

  • AGENTS.<PROVIDER>_SCHEMATA
  • AGENTS.<PROVIDER>_TABLES
  • AGENTS.<PROVIDER>_COLUMNS

The generic views discover those provider-normalized relations and append their fields under a <provider>_ prefix, such as dbt_description, lookml_ai_context, or osi_source_object_id.

This gives providers a suggested integration convention without making the core views know about every provider source table. Benefits:

  • agents get stable SCHEMATA / TABLES / COLUMNS entrypoints
  • providers can publish either physical tables or views for their normalized relation layer
  • providers keep their own source tables and opinions
  • provider columns do not collide because they are prefixed
  • duplicate provider rows are aggregated before joining, avoiding fanout
  • future providers can integrate by following the suffix convention instead of changing core code

Scope

v1 deliberately sticks to the surfaces INFORMATION_SCHEMA already has: SCHEMATA, TABLES, and COLUMNS.

It does not add generic RELATIONSHIPS, METRICS, or ENTITIES views. Those are different object types, and adding them here would make this more of a semantic-model convention than an information-schema extension.

Known limitation

INFORMATION_SCHEMA is per-database, so AGENTS.SCHEMATA / AGENTS.TABLES / AGENTS.COLUMNS cover the database that holds the AGENTS schema. Multi-database coverage via account-level metadata can be handled separately.

Verification

uv run python -m unittest discover -s tests
Ran 21 tests
OK

uv run python -m compileall -q src
git diff --check
clean

Note: tests assert generated SQL structure; they do not execute against Snowflake.

fivetran-davefowler and others added 3 commits May 30, 2026 20:38
- scope to the surfaces information_schema already has (TABLES, COLUMNS); drop
  RELATIONSHIPS/METRICS/ENTITIES and their provider views (deferred; relationships
  should later extend REFERENTIAL_CONSTRAINTS/KEY_COLUMN_USAGE, not a custom view)
- AGENTS.TABLES/COLUMNS use SELECT t.* over information_schema as the spine
  (no hardcoded native column list; inherits whatever the account exposes)
- generic identity merge: left join every discovered {provider}_tables/_columns
  view, enrichment columns appended under a <provider>_ prefix, aggregated to one
  row per identity to prevent fanout; no hardcoded providers
- remove hardcoded memories_count/warnings_count; memory participates later by
  publishing its own *_TABLES/*_COLUMNS view and is picked up automatically
- view creation is fail-soft (warn, never break ingestion)
- align osi_columns identity parsing with osi_tables; rename helper to _relation_identity_sql
- update SPEC, proposal (v1 scope + resolved decisions), root entries, and tests

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- comment that <provider>_ prefixing keeps t.* from colliding with enrichment
- is_time_dimension only true for dimension_group with type time (not duration)
- document name-based, case-folded identity matching in _merge_on
- README: note AGENTS.TABLES/COLUMNS are enriched information_schema and are
  per-database (point workflows at the data's database)
- proposal: reconcile Duplicate And Merge Policy with the v1 prefixed-merge model

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants