Skip to content

[PFR] Integrate Databricks AppKit SDK to replace custom React + FastAPI + Lakebase stack #1

@varunrao

Description

@varunrao

Product Feature Request: AppKit Integration

Summary

The Vibe Coding Workshop app currently uses a custom-built stack (React + FastAPI + psycopg + manual OAuth + custom deployment scripts) for its Databricks App and Lakebase integration. Databricks AppKit is the official TypeScript SDK for building production-ready Databricks applications, and it provides plugin-based replacements for nearly every custom component in this repo — including Lakebase connectivity, Lakehouse SQL queries, conversational AI (Genie), and file management.

This PFR requests evaluating and integrating AppKit to replace the current hand-rolled infrastructure.


1. Current State

The app is deployed as a Databricks App using a custom multi-layer stack:

Component Current Implementation
Frontend React 19 + Vite 7 + Tailwind CSS 4 (src/components/, src/hooks/)
Backend FastAPI (Python) with monolithic route file (src/backend/api/routes.py)
Lakebase connection Custom psycopg2/psycopg3 pool with manual OAuth token refresh (src/backend/services/lakebase.py)
Lakehouse queries Manual DBSQL integration via DB_BACKEND=dbsql env var and hand-coded HTTP calls
LLM / AI OpenAI-compatible client pointed at Foundation Model API (databricks-sdk + openai)
Deployment Custom deploy.sh + setup-lakebase.sh + lakebase_manager.py + DAB templates (databricks.yml.template, app.yaml.template)
Type safety Pydantic models (Python-side only); no frontend type generation from SQL
Config management YAML fallback (prompts_config.yaml) + Lakebase tables + user-config.yaml template system

Pain points with the current approach:

  • Manual OAuth token refresh logic for Lakebase (token expires every ~60 min)
  • No automatic type generation from SQL queries — frontend/backend contract is manual
  • Monolithic routes.py (~dozens of endpoints) with no plugin separation
  • Complex multi-step deployment pipeline (deploy.sh is 300+ lines orchestrating discovery, permissions, schema setup, bundle deploy, and app resource linking)
  • Python backend limits use of AppKit's TypeScript-native ecosystem

2. Desired State: What AppKit Provides

AppKit (v0.21.0+) offers plugin-based replacements for each of the above:

Component AppKit Equivalent
Lakebase connection Lakebase Plugin — automatic OAuth token management, connection pooling, OLTP operations out-of-the-box
Lakehouse queries Analytics Plugin — type-safe SQL queries in config/queries/*.sql with automatic caching, parameterization, and npm run typegen for TypeScript types
Conversational AI Genie Plugin — Databricks AI/BI Genie interface; could enhance the workshop's AI prompt generation flow
File management Files Plugin — browse, upload, and manage files in Unity Catalog Volumes
Frontend @databricks/appkit-ui/react components — BarChart, LineChart, DataTable, useAnalyticsQuery hook
Backend Express + tRPC (TypeScript) with plugin lifecycle phases; modular by design
Deployment databricks apps initdatabricks apps deploy (single command)
Type safety End-to-end TypeScript with auto-generated appKitTypes.d.ts from SQL files
Developer experience Remote hot reload, file-based queries, AI-assisted development via Agent Skills

AppKit core principles (from official docs):

  • Highly opinionated defaults with layered extensibility
  • Zero-trust security by default
  • Production-ready from day one (built-in caching, telemetry, retry logic, error handling)
  • Optimized for both human developers and AI agents

3. Gap Analysis

Area Current App AppKit Gap
Lakebase auth Manual psycopg + WorkspaceClient.postgres.generate_database_credential() with periodic refresh Lakebase plugin handles OAuth automatically AppKit eliminates ~100 lines of custom connection code
SQL type safety None — Pydantic models manually defined npm run typegenappKitTypes.d.ts Full end-to-end type safety from SQL to React
Data visualization Custom React components with manual data fetching <BarChart queryKey="..." />, <DataTable /> with automatic query binding Declarative, type-safe visualization components
API layer FastAPI routes (Python) tRPC (TypeScript) for mutations; SQL files for reads Language shift from Python to TypeScript
Deployment deploy.sh (300+ lines) + DAB templates + lakebase_manager.py databricks apps deploy (single command) Massive simplification
LLM integration Custom OpenAI client + Foundation Model API Not built-in (would still need custom tRPC route) Partial gap — LLM integration stays custom
Workshop-specific logic Session management, leaderboard, prompt generation in Lakebase tables No equivalent — workshop domain logic is custom AppKit provides the platform; domain logic stays
Config admin UI Custom /config route for editing prompts/use cases No equivalent admin scaffolding Would need custom tRPC routes

4. Proposed Migration Path

Phase 1: Scaffold and Evaluate

  • Run databricks apps init --features analytics alongside the existing app
  • Validate that the Lakebase plugin works with the existing autoscaling Lakebase instance
  • Test npm run typegen with the existing DDL (db/lakebase/ddl/)

Phase 2: Migrate Lakebase Layer

  • Replace src/backend/services/lakebase.py with AppKit Lakebase plugin
  • Port session CRUD, workshop parameters, and use case description queries
  • Validate OAuth token lifecycle is handled automatically

Phase 3: Migrate SQL Queries to Analytics Plugin

  • Move Lakehouse queries (DBSQL) into config/queries/*.sql
  • Run npm run typegen to generate TypeScript types
  • Replace manual DBSQL HTTP calls with useAnalyticsQuery hooks

Phase 4: Migrate Frontend

  • Replace custom React components with @databricks/appkit-ui/react where applicable
  • Use <DataTable>, <BarChart>, etc. for data display
  • Keep workshop-specific UI (workflow diagram, prompt editor, leaderboard) as custom components

Phase 5: Retire Custom Infrastructure

  • Replace deploy.sh / setup-lakebase.sh / lakebase_manager.py with databricks apps deploy
  • Remove DAB template generation (databricks.yml.template, app.yaml.template, vibe2value configure)
  • Archive src/backend/ (FastAPI) in favor of AppKit's Express + tRPC server

5. Benefits

  • Reduced maintenance: Eliminate ~1,000+ lines of custom infrastructure code (Lakebase connection, deployment scripts, token refresh)
  • Official support: AppKit is the Databricks-recommended SDK — bug fixes and new features flow automatically
  • End-to-end type safety: SQL → TypeScript types → React components with zero manual model definitions
  • Plugin ecosystem: Future Databricks plugins (e.g., new data sources, auth methods) integrate with zero custom code
  • AI-optimized DX: AppKit is designed for AI-assisted development — better Agent Skills integration
  • Simpler onboarding: New contributors use databricks apps init instead of learning custom deploy scripts

6. Risks and Considerations

  • Migration effort: Significant rewrite from Python (FastAPI) backend to TypeScript (Express + tRPC)
  • LLM integration: AppKit has no built-in Foundation Model plugin — prompt generation logic stays custom
  • Workshop domain logic: Session management, leaderboard, and config admin are app-specific and won't benefit from AppKit plugins directly
  • AppKit maturity: At v0.21.0, some plugins may have gaps vs. the battle-tested custom implementation
  • Python ecosystem: Any Python-specific libraries (PyMuPDF for PDF processing) would need TypeScript alternatives or a separate service

References

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions