Add GenieSpaceBuilder + authoring walkthrough to databricks-genie skill#495
Open
KabeerThockchom wants to merge 2 commits intodatabricks-solutions:mainfrom
Open
Add GenieSpaceBuilder + authoring walkthrough to databricks-genie skill#495KabeerThockchom wants to merge 2 commits intodatabricks-solutions:mainfrom
KabeerThockchom wants to merge 2 commits intodatabricks-solutions:mainfrom
Conversation
…kill - databricks-mcp-server/databricks_mcp_server/tools/genie_space_builder.py: typed authoring API over the serialized_space payload. Covers tables, metric_views, column_configs, sample_questions, text_instructions, example_question_sqls, join_specs, sql_snippets (filters/expressions/ measures), and benchmarks. Preserves unknown fields on round-trip. - databricks-skills/databricks-genie/spaces-authoring.md: 7-step walkthrough for building rich spaces via the builder, covering scan, metric views, joins, descriptions, sample questions with SQL, snippets, benchmarks. - databricks-skills/databricks-genie/SKILL.md: link to the new doc. - 25 unit tests (no network / workspace dependencies). Complements PR databricks-solutions#473 (serialized_space schema documentation) by making the schema ergonomic to author from code.
Verified via end-to-end smoke test against a live workspace
(/api/2.0/genie/spaces). Original schema was modeled after older
implementations (sunnysingh-db/ai-genie-space-generator,
fe-internal-tools:genie-rooms) and didn't match the current proto.
Wire-format corrections:
- join_specs: use {left, right}: {identifier, alias} objects (not
flat left_table/right_table strings); store the join condition in
a `sql` list with relationship_type encoded as
`--rt=FROM_RELATIONSHIP_TYPE_X--`. Drop the `join_type` field
(the proto does not accept it). Validate relationship_type against
ONE_TO_ONE / ONE_TO_MANY / MANY_TO_ONE / MANY_TO_MANY.
- sql_snippets.measures: use `alias` (not `name`).
- sql_snippets.{filters,measures,expressions}: store `sql` and
`comment` as [str] lists (the wire format).
- benchmarks.questions: use `answer: [{format, content}]` (not
`answers: [{format, body}]`).
Also normalise emit-time: tables and metric_views must be sorted by
identifier; column_configs must be sorted by column_name; id-keyed
lists are sorted by id. Sorting is applied in to_dict() / to_json()
so users can add entries in any order.
Tests now assert against the real wire format. 26/26 pass.
Smoke test (not in PR) verifies a builder-built envelope is accepted
by /api/2.0/genie/spaces and round-trips through GET.
Author
|
End-to-end smoke test complete against a live workspace — checked off the last item in the test plan. Pushed commit
Smoke test result: built a 3-table / 3-sample-Q / 1-example-SQL / 2-join / 3-snippet / 1-text-instruction / 1-benchmark payload via the builder, POSTed to Tests: 26/26 still passing. Lint + format clean. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a typed authoring API for Genie Space
serialized_spacepayloads plus a skill-level walkthrough covering the full authoring pipeline.databricks-mcp-server/databricks_mcp_server/tools/genie_space_builder.py(new) —GenieSpaceBuilderclass with path constants andadd_*/replace_*/find_by_id/to_json/from_jsonhelpers for everyserialized_spaceslot: tables, metric_views, column_configs, sample_questions, text_instructions, example_question_sqls, join_specs, sql_snippets (filters/expressions/measures), and benchmarks. Handles 32-char UUID/hex IDs per API spec. Preserves unknown fields on round-trip. Pure Python, no network calls, no LLM dependencies.databricks-skills/databricks-genie/spaces-authoring.md(new) — 7-step authoring walkthrough: scan metadata → metric views (dimensions + measures via UC semantic layer YAML) → joins → table/column descriptions → sample questions with certified SQL → reusable snippets → text instructions + benchmarks. Shows round-trip export/modify/import.databricks-skills/databricks-genie/SKILL.md— adds a reference link to the new doc.databricks-mcp-server/tests/test_genie_space_builder.py(new) — 25 unit tests covering round-trip fidelity, data-source management, column_configs replacement + sort order, all instructions slots, snippet field stripping, benchmark structure, and genericfind/replace/remove_by_idoperations.Context
The existing
databricks-genieskill supports creating and migrating spaces well, but the only ergonomic path to populate the richserialized_spaceslots today is hand-crafting JSON or round-tripping an exported space. Users building new spaces through the skill tend to ship thin spaces (tables + sample questions only) and fill the rest via the UI.This PR makes the full schema authorable from code. It complements #473 — @sean-zhang-dbx documents the schema (
references/schema.md,references/best-practices.md) and this PR makes the documented schema ergonomic to fill. Docs can cross-reference once both land.Scope is deliberately minimal: authoring helper + walkthrough only. No changes to existing MCP tools. No LLM pipeline (the walkthrough references
sunnysingh-db/ai-genie-space-generatoras a reference implementation for anyone who wants to drive the builder from an LLM).Cleared with @calreynolds in #ai-dev-kit before opening.
Test plan
--select=E,F,B,PIEper CONTRIBUTING)pytest databricks-mcp-server/tests/test_genie_space_builder.py)from_json(to_envelope())reconstructs an equivalent buildermanage_genie,execute_sql,get_table_stats_and_schema)manage_genie(action="import"))This pull request and its description were written by Kabeer with Claude assistance.