Skip to content

docs(lakebase-autoscale): canonical psycopg_pool + OAuthConnection pattern#488

Open
dgokeeffe wants to merge 3 commits intodatabricks-solutions:mainfrom
dgokeeffe:feat/lakebase-canonical-pattern-clean
Open

docs(lakebase-autoscale): canonical psycopg_pool + OAuthConnection pattern#488
dgokeeffe wants to merge 3 commits intodatabricks-solutions:mainfrom
dgokeeffe:feat/lakebase-canonical-pattern-clean

Conversation

@dgokeeffe
Copy link
Copy Markdown

Summary

Restructures the databricks-lakebase-autoscale skill to lead with the canonical connection pattern from the official Databricks Apps + Lakebase tutorial, and adds an explicit framing of how the Python ecosystem fits together.

What changed

connection-patterns.md — reordered and expanded:

  • Pattern 1 (new, canonical): psycopg_pool.ConnectionPool + OAuthConnection subclass + max_lifetime=2700. Matches the official tutorial, the external app SDK guide, and databricks-ai-bridge. Zero background threads — rotation happens transparently via pool recycling.
  • Pattern 2 (demoted): the previous SQLAlchemy do_connect + asyncio.Task refresh pattern is now marked "alternative for apps already using SQLAlchemy async", with a note that it adds unnecessary operational complexity for the common case.
  • Patterns 3–4: direct psycopg.connect (scripts only) and static URL (local dev only) — unchanged in spirit, trimmed.
  • Added FastAPI variant (open=False + explicit lifespan).

SKILL.md — new up-front overview section:

  • Explicit "There is no separate Lakebase SDK for Python" framing — readers repeatedly ask this.
  • Cross-language table (Python / Node-TS / Java-Go) showing which SDK and DB driver to use.
  • Mention of @databricks/lakebase as the Node/TS convenience wrapper (Autoscaling-only).
  • "What NOT to do" list — most importantly flagging that WorkspaceClient().config.token is workspace-scoped and will fail at Postgres login. Must use generate_database_credential() for a Lakebase-scoped token.

Why

  • The old connection-patterns.md led with a SQLAlchemy + background-refresh loop, which works but is not what the official tutorial or reference implementations use.
  • The config.token vs generate_database_credential() distinction was buried; it's the Add initial skills for Databricks development #1 cause of "my connection works locally but fails in prod" bugs.
  • max_lifetime=2700 vs the 3600 default was implicit; the new doc explains why the default creates a race condition.

Test plan

  • Read through both files end-to-end for accuracy
  • Verified the canonical pattern against the official tutorial URL
  • Verified max_lifetime=2700 rationale (15-min buffer before 1-hour expiry)
  • Cross-checked cross-language table with @databricks/lakebase README

This pull request and its description were written by Isaac.

David O'Keeffe added 2 commits April 23, 2026 19:26
…nection pattern

Restructure connection-patterns.md to match the official Databricks tutorial
and databricks-ai-bridge reference implementation:

- Pattern 1 (canonical, new): psycopg_pool.ConnectionPool + OAuthConnection
  subclass + max_lifetime=2700. Zero background threads, rotation via pool
  recycling. This is what docs.databricks.com's Lakebase Apps tutorial uses.
- Pattern 2: SQLAlchemy do_connect event (was previously presented as the
  production pattern — now demoted to "alternative for apps already using
  SQLAlchemy async", with an explicit note it adds unnecessary complexity).
- Pattern 3: Direct psycopg.connect for scripts/notebooks.
- Pattern 4: Static URL for local dev.

New explicit warnings:
- config.token / oauth_token().access_token is WORKSPACE-scoped and will fail
  at Postgres login. Must use w.postgres.generate_database_credential().
- max_lifetime=3600 (the default) creates a race condition; use 2700 so the
  pool recycles 15 min before the 1-hour token expiry.
- ENDPOINT_NAME env var must be set manually — Databricks auto-injects
  PGHOST/PGPORT/PGDATABASE/PGUSER/PGSSLMODE but NOT the endpoint path.

Canonical sources cited:
- docs.databricks.com/aws/en/oltp/projects/tutorial-databricks-apps-autoscaling
- docs.databricks.com/aws/en/oltp/projects/external-apps-connect
- github.com/databricks/databricks-ai-bridge (src/databricks_ai_bridge/lakebase.py)

Co-authored-by: Isaac
…oss-language table

The existing overview jumped straight into features. Readers arriving from
"how do I use Lakebase from Python?" needed two things made explicit:

1. There is no separate Lakebase SDK for Python. You use databricks-sdk
   only for minting OAuth credentials; a standard Postgres driver does the
   actual queries. (This was implicit in the connection patterns doc but
   not called out up-front.)
2. Node/TypeScript has a convenience wrapper: @databricks/lakebase
   (re-exported by @databricks/appkit). Autoscaling-only, not Provisioned.
   Worth mentioning so JS/TS readers know it exists.

Also added a cross-language summary table and an explicit "What NOT to do"
list — most importantly flagging that WorkspaceClient().config.token is
workspace-scoped and will be rejected at Postgres login. This is a trap
several of us have fallen into.

Co-authored-by: Isaac
Copy link
Copy Markdown
Collaborator

@cankoklu-db cankoklu-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry have to fix this.

@cankoklu-db
Copy link
Copy Markdown
Collaborator

Correcting my earlier comment — I cited /aws/en/oltp/instances/authentication (which is labeled Lakebase Provisioned in its breadcrumb) when this skill is for Lakebase Autoscaling specifically. Narrower set of observations that hold up against the Autoscaling tutorial, the external apps Autoscaling guide, and the Apps Lakebase resource doc:

1. Auto-injected env vars: 6, not 5
The Apps Lakebase resource doc explicitly lists PGAPPNAME, PGDATABASE, PGHOST, PGPORT, PGSSLMODE, PGUSER for the first database resource. The PR's app.yaml comment omits PGAPPNAME. The doc also notes only the first database resource gets auto-injected — multi-Lakebase apps need valueFrom: {resource: <key>, key: ...} for resource #2+.

2. @databricks/lakebase Autoscaling-only scope should be in the table cell, not just the prose
The README states explicitly: "NOT compatible with the Databricks Lakebase Provisioned". Source code calls /api/2.0/postgres/credentials (Autoscaling endpoint-resource API). Readers scanning the cross-language table for Provisioned guidance miss the caveat above the table. Either add a Scope column or change the cell to @databricks/lakebase (Autoscaling only).

3. Sharpen the open=True vs open=False rationale
The primary driver for FastAPI's open=False isn't just "fail-fast on startup" — open=True is deprecated for AsyncConnectionPool and becomes an error in psycopg 4.0 (still the sync default). databricks-ai-bridge follows the same sync-True / async-False split. One sentence in the doc would make the asymmetry principled rather than stylistic.

4. One-line comment on the psycopg3 pin

psycopg[binary,pool]>=3.1.0  # psycopg3 required — psycopg2.pool has no connection_class hook for the OAuthConnection pattern

The Autoscaling tutorial uses psycopg3 throughout. Calling out why helps readers with psycopg2 muscle memory who'd otherwise try to swap drivers.

5. Pattern 2 pool_recycle — optional alignment, not a fix
databricks-ai-bridge uses 2700 (PR #316) with an explicit 45-min-before-60-min comment. Pattern 2's 3600 isn't wrong — the Autoscaling tutorial itself doesn't set max_lifetime at all (relies on psycopg_pool's defaults), which undercuts Pattern 1's "Always use 2700" strength. Up to you whether to align Pattern 2 with databricks-ai-bridge for internal consistency with Pattern 1, or leave both as-is and soften Pattern 1's "Always use 2700" to "prefer 2700".

Retracted

  • My earlier claim that Pattern 1's "minute 59 / minute 60 will fail" sentence is "provably false" — that was based on the Provisioned auth doc, which is not the right authority for an Autoscaling skill. The Autoscaling tutorial doesn't explicitly describe post-expiry connection behavior, so there's no public Autoscaling source that contradicts the PR's framing. No change needed there.
  • My earlier framing that the env-var list needed "fixing" was a tone error — it's a minor completeness gap, not a factual error in how the pattern works.

Apologies for the noise in the first pass.

- Fix PGAPPNAME omission: 6 env vars auto-injected, not 5; note multi-resource caveat
- Add psycopg3 pin comment explaining why psycopg2 won't work (no connection_class hook)
- Strengthen open=False rationale: deprecated for AsyncConnectionPool, errors in psycopg 4.0
- Clarify @databricks/lakebase scope in cross-language table (Autoscaling only)

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants