Add GenieSpaceBuilder + authoring walkthrough to databricks-genie skill by KabeerThockchom · Pull Request #495 · databricks-solutions/ai-dev-kit

KabeerThockchom · 2026-04-24T21:35:35Z

Summary

Adds a typed authoring API for Genie Space serialized_space payloads plus a skill-level walkthrough covering the full authoring pipeline.

databricks-mcp-server/databricks_mcp_server/tools/genie_space_builder.py (new) — GenieSpaceBuilder class with path constants and add_* / replace_* / find_by_id / to_json / from_json helpers for every serialized_space slot: tables, metric_views, column_configs, sample_questions, text_instructions, example_question_sqls, join_specs, sql_snippets (filters/expressions/measures), and benchmarks. Handles 32-char UUID/hex IDs per API spec. Preserves unknown fields on round-trip. Pure Python, no network calls, no LLM dependencies.
databricks-skills/databricks-genie/spaces-authoring.md (new) — 7-step authoring walkthrough: scan metadata → metric views (dimensions + measures via UC semantic layer YAML) → joins → table/column descriptions → sample questions with certified SQL → reusable snippets → text instructions + benchmarks. Shows round-trip export/modify/import.
databricks-skills/databricks-genie/SKILL.md — adds a reference link to the new doc.
databricks-mcp-server/tests/test_genie_space_builder.py (new) — 25 unit tests covering round-trip fidelity, data-source management, column_configs replacement + sort order, all instructions slots, snippet field stripping, benchmark structure, and generic find/replace/remove_by_id operations.

Context

The existing databricks-genie skill supports creating and migrating spaces well, but the only ergonomic path to populate the rich serialized_space slots today is hand-crafting JSON or round-tripping an exported space. Users building new spaces through the skill tend to ship thin spaces (tables + sample questions only) and fill the rest via the UI.

This PR makes the full schema authorable from code. It complements #473 — @sean-zhang-dbx documents the schema (references/schema.md, references/best-practices.md) and this PR makes the documented schema ergonomic to fill. Docs can cross-reference once both land.

Scope is deliberately minimal: authoring helper + walkthrough only. No changes to existing MCP tools. No LLM pipeline (the walkthrough references sunnysingh-db/ai-genie-space-generator as a reference implementation for anyone who wants to drive the builder from an LLM).

Cleared with @calreynolds in #ai-dev-kit before opening.

Test plan

Ruff lint passes (--select=E,F,B,PIE per CONTRIBUTING)
Ruff format passes
25/25 unit tests pass (pytest databricks-mcp-server/tests/test_genie_space_builder.py)
Round-trip fidelity verified: from_json(to_envelope()) reconstructs an equivalent builder
Unknown-field preservation verified
Doc examples verified to reference only existing MCP tool signatures (manage_genie, execute_sql, get_table_stats_and_schema)
End-to-end smoke test against a live workspace (reviewer to verify — builder-built envelope passed through manage_genie(action="import"))

This pull request and its description were written by Kabeer with Claude assistance.

…kill - databricks-mcp-server/databricks_mcp_server/tools/genie_space_builder.py: typed authoring API over the serialized_space payload. Covers tables, metric_views, column_configs, sample_questions, text_instructions, example_question_sqls, join_specs, sql_snippets (filters/expressions/ measures), and benchmarks. Preserves unknown fields on round-trip. - databricks-skills/databricks-genie/spaces-authoring.md: 7-step walkthrough for building rich spaces via the builder, covering scan, metric views, joins, descriptions, sample questions with SQL, snippets, benchmarks. - databricks-skills/databricks-genie/SKILL.md: link to the new doc. - 25 unit tests (no network / workspace dependencies). Complements PR databricks-solutions#473 (serialized_space schema documentation) by making the schema ergonomic to author from code.

Verified via end-to-end smoke test against a live workspace (/api/2.0/genie/spaces). Original schema was modeled after older implementations (sunnysingh-db/ai-genie-space-generator, fe-internal-tools:genie-rooms) and didn't match the current proto. Wire-format corrections: - join_specs: use {left, right}: {identifier, alias} objects (not flat left_table/right_table strings); store the join condition in a `sql` list with relationship_type encoded as `--rt=FROM_RELATIONSHIP_TYPE_X--`. Drop the `join_type` field (the proto does not accept it). Validate relationship_type against ONE_TO_ONE / ONE_TO_MANY / MANY_TO_ONE / MANY_TO_MANY. - sql_snippets.measures: use `alias` (not `name`). - sql_snippets.{filters,measures,expressions}: store `sql` and `comment` as [str] lists (the wire format). - benchmarks.questions: use `answer: [{format, content}]` (not `answers: [{format, body}]`). Also normalise emit-time: tables and metric_views must be sorted by identifier; column_configs must be sorted by column_name; id-keyed lists are sorted by id. Sorting is applied in to_dict() / to_json() so users can add entries in any order. Tests now assert against the real wire format. 26/26 pass. Smoke test (not in PR) verifies a builder-built envelope is accepted by /api/2.0/genie/spaces and round-trips through GET.

KabeerThockchom · 2026-04-24T23:44:53Z

End-to-end smoke test complete against a live workspace — checked off the last item in the test plan.

Pushed commit 423b1b6 with schema corrections discovered during the smoke test. The original payload structure (modeled after sunnysingh-db/ai-genie-space-generator and the internal fe-internal-tools:genie-rooms builder) didn't match the current Genie API proto. Specific fixes:

join_specs: use {left, right}: {identifier, alias} objects (not flat left_table/right_table strings); join condition goes in a sql list with relationship_type encoded as --rt=FROM_RELATIONSHIP_TYPE_X--. Dropped the join_type field — the proto rejects it. Added validation against the four valid relationship types.
sql_snippets.measures: API field is alias, not name.
sql_snippets.{filters,measures,expressions}: sql and comment are [str] lists, not strings.
benchmarks.questions: uses answer: [{format, content}] (not answers: [{format, body}]).
Sort constraints: data_sources.tables and data_sources.metric_views must be sorted by identifier; column_configs by column_name; id-keyed lists by id. Now normalized at emit time so users can add entries in any order.

Smoke test result: built a 3-table / 3-sample-Q / 1-example-SQL / 2-join / 3-snippet / 1-text-instruction / 1-benchmark payload via the builder, POSTed to /api/2.0/genie/spaces, fetched it back, and deleted it. Test is not in the PR (it's a one-off probe pinned to a specific catalog) but available locally.

Tests: 26/26 still passing. Lint + format clean.

KabeerThockchom added 2 commits April 24, 2026 14:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GenieSpaceBuilder + authoring walkthrough to databricks-genie skill#495

Add GenieSpaceBuilder + authoring walkthrough to databricks-genie skill#495
KabeerThockchom wants to merge 2 commits intodatabricks-solutions:mainfrom
KabeerThockchom:feature/genie-space-builder

KabeerThockchom commented Apr 24, 2026

Uh oh!

KabeerThockchom commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KabeerThockchom commented Apr 24, 2026

Summary

Context

Test plan

Uh oh!

KabeerThockchom commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant