From 702079416e7b2e97ca8490891c19f9f3e58b9152 Mon Sep 17 00:00:00 2001 From: David O'Keeffe Date: Thu, 23 Apr 2026 19:19:22 +1000 Subject: [PATCH 1/3] docs(lakebase-autoscale): lead with canonical psycopg_pool + OAuthConnection pattern MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Restructure connection-patterns.md to match the official Databricks tutorial and databricks-ai-bridge reference implementation: - Pattern 1 (canonical, new): psycopg_pool.ConnectionPool + OAuthConnection subclass + max_lifetime=2700. Zero background threads, rotation via pool recycling. This is what docs.databricks.com's Lakebase Apps tutorial uses. - Pattern 2: SQLAlchemy do_connect event (was previously presented as the production pattern — now demoted to "alternative for apps already using SQLAlchemy async", with an explicit note it adds unnecessary complexity). - Pattern 3: Direct psycopg.connect for scripts/notebooks. - Pattern 4: Static URL for local dev. New explicit warnings: - config.token / oauth_token().access_token is WORKSPACE-scoped and will fail at Postgres login. Must use w.postgres.generate_database_credential(). - max_lifetime=3600 (the default) creates a race condition; use 2700 so the pool recycles 15 min before the 1-hour token expiry. - ENDPOINT_NAME env var must be set manually — Databricks auto-injects PGHOST/PGPORT/PGDATABASE/PGUSER/PGSSLMODE but NOT the endpoint path. Canonical sources cited: - docs.databricks.com/aws/en/oltp/projects/tutorial-databricks-apps-autoscaling - docs.databricks.com/aws/en/oltp/projects/external-apps-connect - github.com/databricks/databricks-ai-bridge (src/databricks_ai_bridge/lakebase.py) Co-authored-by: Isaac --- .../connection-patterns.md | 350 ++++++++++++------ 1 file changed, 230 insertions(+), 120 deletions(-) diff --git a/databricks-skills/databricks-lakebase-autoscale/connection-patterns.md b/databricks-skills/databricks-lakebase-autoscale/connection-patterns.md index 398862b3..bd13c2b3 100644 --- a/databricks-skills/databricks-lakebase-autoscale/connection-patterns.md +++ b/databricks-skills/databricks-lakebase-autoscale/connection-patterns.md @@ -2,81 +2,187 @@ ## Overview -This document covers different connection patterns for Lakebase Autoscaling, from simple scripts to production applications with token refresh. +This document covers the canonical connection patterns for Lakebase Autoscaling, ordered by recommendation: -## Authentication Methods +1. **`psycopg_pool.ConnectionPool` + `OAuthConnection`** — canonical for production Databricks Apps. Used by the [official tutorial](https://docs.databricks.com/aws/en/oltp/projects/tutorial-databricks-apps-autoscaling), the [external app SDK guide](https://docs.databricks.com/aws/en/oltp/projects/external-apps-connect), and [`databricks-ai-bridge`](https://github.com/databricks/databricks-ai-bridge/blob/main/src/databricks_ai_bridge/lakebase.py). Zero background threads — rotation is handled by pool recycling. +2. **SQLAlchemy `do_connect` event + background refresh** — alternative for apps already using SQLAlchemy async. Works but adds a background `asyncio.Task` you don't need. +3. **Direct `psycopg.connect`** — only for one-off scripts / notebooks where the session lives < 1 hour. +4. **Static URL** — local development only. + +## Authentication Lakebase Autoscaling supports two authentication methods: | Method | Token Lifetime | Best For | |--------|---------------|----------| -| **OAuth tokens** | 1 hour (must refresh) | Interactive sessions, workspace-integrated apps | +| **OAuth tokens** (`generate_database_credential`) | 1 hour, enforced at login only | Apps — rotate via pool recycling | | **Native Postgres passwords** | No expiry | Long-running processes, tools without token rotation | +**Critical distinction:** The workspace OAuth token (`w.config.oauth_token().access_token`) is *workspace-scoped* — it will **fail at PG login**. You must call `w.postgres.generate_database_credential(endpoint=...)` to mint a separate *Lakebase-scoped* JWT: + +```python +# ✅ CORRECT — Lakebase-scoped database credential +cred = w.postgres.generate_database_credential(endpoint=endpoint_name) +password = cred.token + +# ❌ WRONG — workspace-scoped token +password = w.config.oauth_token().access_token +``` + **Connection timeouts (both methods):** - **24-hour idle timeout**: Connections with no activity for 24 hours are automatically closed - **3-day maximum connection life**: Connections alive for more than 3 days may be closed Design your applications to handle connection timeouts with retry logic. -## Connection Methods +## 1. `psycopg_pool.ConnectionPool` + `OAuthConnection` (CANONICAL) + +This is the pattern from the official Databricks tutorial, external app guide, and `databricks-ai-bridge`. **Use this for any production Databricks App.** + +### How it works -### 1. Direct psycopg Connection (Simple Scripts) +1. `OAuthConnection.connect()` mints a fresh Lakebase credential every time the pool opens a new physical connection. +2. Lakebase tokens expire at 1 hour, but expiration is enforced **only at login** — already-open connections stay valid. +3. `max_lifetime=2700` (45 min) tells the pool to recycle connections before tokens expire. When the pool reopens, `OAuthConnection.connect()` fires and gets a fresh token. +4. The 15-minute buffer (60 min token − 45 min recycle) means you never race against expiry. -For one-off scripts or notebooks: +**Result:** Fully transparent token rotation with zero background tasks, zero timers, zero manual refresh logic. + +> **Why not `max_lifetime=3600` (the default)?** You'd hand out connections with nearly-expired tokens. A connection established at minute 59 with a token that expires at minute 60 will fail a minute later. Always use 2700. + +### `app.yaml` + +```yaml +command: ['flask', '--app', 'app.py', 'run', '--host', '0.0.0.0', '--port', '8000'] +env: + # These 5 are auto-injected when you add a Lakebase (postgres) resource in the UI: + # PGHOST, PGPORT, PGDATABASE, PGUSER, PGSSLMODE + # You MUST manually add ENDPOINT_NAME — it's needed by generate_database_credential(): + - name: ENDPOINT_NAME + value: 'projects//branches//endpoints/' +``` + +### `requirements.txt` + +``` +flask +psycopg[binary,pool]>=3.1.0 +databricks-sdk>=0.81.0 +``` + +### `app.py` (Flask) ```python -import psycopg +import os from databricks.sdk import WorkspaceClient +import psycopg +from psycopg_pool import ConnectionPool +from flask import Flask -def get_connection(project_id: str, branch_id: str = "production", - endpoint_id: str = None, database_name: str = "databricks_postgres"): - """Get a database connection with fresh OAuth token.""" - w = WorkspaceClient() +app = Flask(__name__) - # Get endpoint details to find the host - if endpoint_id: - ep_name = f"projects/{project_id}/branches/{branch_id}/endpoints/{endpoint_id}" - else: - # List endpoints and pick the primary R/W one - endpoints = list(w.postgres.list_endpoints( - parent=f"projects/{project_id}/branches/{branch_id}" - )) - ep_name = endpoints[0].name +# Inside Databricks Apps, WorkspaceClient() auto-authenticates via SP credentials. +w = WorkspaceClient() - endpoint = w.postgres.get_endpoint(name=ep_name) - host = endpoint.status.hosts.host - # Generate OAuth token (valid for 1 hour) - cred = w.postgres.generate_database_credential(endpoint=ep_name) +class OAuthConnection(psycopg.Connection): + """Inject a fresh Lakebase OAuth token on every pool-opened connection. - # Build connection string - conn_string = ( - f"host={host} " - f"dbname={database_name} " - f"user={w.current_user.me().user_name} " - f"password={cred.token} " - f"sslmode=require" - ) + The pool calls OAuthConnection.connect() when: + - Filling min_size on startup + - Recycling a connection (max_lifetime exceeded) + - Creating a new connection under load + - Replacing a connection that failed health-check - return psycopg.connect(conn_string) + No background refresh thread is needed: tokens are always fresh at login + time, and login is where Lakebase enforces expiration. + """ -# Usage -with get_connection("my-app") as conn: - with conn.cursor() as cur: - cur.execute("SELECT NOW()") - print(cur.fetchone()) + @classmethod + def connect(cls, conninfo='', **kwargs): + endpoint_name = os.environ["ENDPOINT_NAME"] + cred = w.postgres.generate_database_credential(endpoint=endpoint_name) + kwargs['password'] = cred.token + return super().connect(conninfo, **kwargs) + + +username = os.environ["PGUSER"] # SP client ID — auto-injected +host = os.environ["PGHOST"] # e.g. ep-restless-pond-e4wvk0yn... — auto-injected +port = os.environ.get("PGPORT", "5432") +database = os.environ["PGDATABASE"] # typically "databricks_postgres" — auto-injected +sslmode = os.environ.get("PGSSLMODE", "require") + +pool = ConnectionPool( + conninfo=( + f"dbname={database} user={username} " + f"host={host} port={port} sslmode={sslmode}" + ), + connection_class=OAuthConnection, + min_size=1, + max_size=10, + # CRITICAL: 2700 (45 min), not the 3600 default. + # Recycles connections 15 min before the 1-hour token expiry. + max_lifetime=2700, + open=True, +) + + +@app.route('/') +def index(): + with pool.connection() as conn: + with conn.cursor() as cur: + cur.execute("SELECT current_user, current_database()") + row = cur.fetchone() + return f"Connected as {row[0]} to {row[1]}" + + +if __name__ == '__main__': + app.run(host="0.0.0.0", port=8000) ``` -### 2. Connection Pool with Token Refresh (Production) +### FastAPI variant -For long-running applications that need connection pooling: +Identical pattern, but use `open=False` with an explicit lifespan so startup failures surface immediately: ```python -import asyncio -import uuid from contextlib import asynccontextmanager +from fastapi import FastAPI + +pool = ConnectionPool( + conninfo=..., + connection_class=OAuthConnection, + min_size=1, max_size=10, + max_lifetime=2700, + open=False, # Opened explicitly in lifespan +) + + +@asynccontextmanager +async def lifespan(app: FastAPI): + pool.open(wait=True, timeout=30.0) # Fail fast if DB unreachable + yield + pool.close() + + +app = FastAPI(lifespan=lifespan) + + +@app.get("/api/data") +def get_data(): # sync def — FastAPI runs in threadpool automatically + with pool.connection() as conn: + with conn.cursor() as cur: + cur.execute("SELECT ...") + return cur.fetchall() +``` + +## 2. SQLAlchemy `do_connect` Event (Alternative) + +**Use only if your app is already SQLAlchemy-async.** Otherwise prefer pattern 1 — this adds a background refresh task you don't need. + +```python +import asyncio from typing import AsyncGenerator, Optional +from contextlib import asynccontextmanager from sqlalchemy import event from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine, async_sessionmaker @@ -84,7 +190,11 @@ from databricks.sdk import WorkspaceClient class LakebaseAutoscaleConnectionManager: - """Manages Lakebase Autoscaling connections with automatic token refresh.""" + """Manages Lakebase Autoscaling connections with background token refresh. + + This pattern works but adds operational complexity (a background asyncio.Task) + that isn't necessary. Prefer psycopg_pool + OAuthConnection (pattern 1). + """ def __init__( self, @@ -93,7 +203,7 @@ class LakebaseAutoscaleConnectionManager: database_name: str = "databricks_postgres", pool_size: int = 5, max_overflow: int = 10, - token_refresh_seconds: int = 3000 # 50 minutes + token_refresh_seconds: int = 3000, # 50 minutes ): self.project_id = project_id self.branch_id = branch_id @@ -107,32 +217,28 @@ class LakebaseAutoscaleConnectionManager: self._engine = None self._session_maker = None - def _generate_token(self) -> str: - """Generate fresh OAuth token.""" + def _endpoint_name(self) -> str: w = WorkspaceClient() - # Get primary endpoint name for token scoping endpoints = list(w.postgres.list_endpoints( parent=f"projects/{self.project_id}/branches/{self.branch_id}" )) - endpoint_name = endpoints[0].name if endpoints else None - cred = w.postgres.generate_database_credential(endpoint=endpoint_name) + if not endpoints: + raise RuntimeError( + f"No endpoints for projects/{self.project_id}/branches/{self.branch_id}" + ) + return endpoints[0].name + + def _generate_token(self) -> str: + w = WorkspaceClient() + cred = w.postgres.generate_database_credential(endpoint=self._endpoint_name()) return cred.token def _get_host(self) -> str: - """Get the connection host from the primary endpoint.""" w = WorkspaceClient() - endpoints = list(w.postgres.list_endpoints( - parent=f"projects/{self.project_id}/branches/{self.branch_id}" - )) - if not endpoints: - raise RuntimeError( - f"No endpoints found for projects/{self.project_id}/branches/{self.branch_id}" - ) - endpoint = w.postgres.get_endpoint(name=endpoints[0].name) - return endpoint.status.hosts.host + ep = w.postgres.get_endpoint(name=self._endpoint_name()) + return ep.status.hosts.host async def _refresh_loop(self): - """Background task to refresh token periodically.""" while True: await asyncio.sleep(self.token_refresh_seconds) try: @@ -141,48 +247,34 @@ class LakebaseAutoscaleConnectionManager: print(f"Token refresh failed: {e}") def initialize(self): - """Initialize database engine and start token refresh.""" w = WorkspaceClient() - - # Get host info host = self._get_host() username = w.current_user.me().user_name - # Generate initial token self._current_token = self._generate_token() - # Create engine (password injected via event) - url = ( - f"postgresql+psycopg://{username}@" - f"{host}:5432/{self.database_name}" - ) - + url = f"postgresql+psycopg://{username}@{host}:5432/{self.database_name}" self._engine = create_async_engine( url, pool_size=self.pool_size, max_overflow=self.max_overflow, pool_recycle=3600, - connect_args={"sslmode": "require"} + connect_args={"sslmode": "require"}, ) - # Inject token on connect @event.listens_for(self._engine.sync_engine, "do_connect") def inject_token(dialect, conn_rec, cargs, cparams): cparams["password"] = self._current_token self._session_maker = async_sessionmaker( - self._engine, - class_=AsyncSession, - expire_on_commit=False + self._engine, class_=AsyncSession, expire_on_commit=False ) def start_refresh(self): - """Start background token refresh task.""" if not self._refresh_task: self._refresh_task = asyncio.create_task(self._refresh_loop()) async def stop_refresh(self): - """Stop token refresh task.""" if self._refresh_task: self._refresh_task.cancel() try: @@ -193,73 +285,90 @@ class LakebaseAutoscaleConnectionManager: @asynccontextmanager async def session(self) -> AsyncGenerator[AsyncSession, None]: - """Get a database session.""" async with self._session_maker() as session: yield session async def close(self): - """Close all connections.""" await self.stop_refresh() if self._engine: await self._engine.dispose() +``` +## 3. Direct `psycopg.connect` (Scripts / Notebooks Only) -# Usage in FastAPI -from fastapi import FastAPI +For one-off scripts or notebooks where the process lives well under an hour: + +```python +import psycopg +from databricks.sdk import WorkspaceClient -app = FastAPI() -db_manager = LakebaseAutoscaleConnectionManager("my-app", "production", "my_database") -@app.on_event("startup") -async def startup(): - db_manager.initialize() - db_manager.start_refresh() +def get_connection(project_id: str, branch_id: str = "production", + endpoint_id: str = None, database_name: str = "databricks_postgres"): + """Get a one-shot database connection with a fresh OAuth token.""" + w = WorkspaceClient() -@app.on_event("shutdown") -async def shutdown(): - await db_manager.close() + if endpoint_id: + ep_name = f"projects/{project_id}/branches/{branch_id}/endpoints/{endpoint_id}" + else: + # Pick the first endpoint under the branch + endpoints = list(w.postgres.list_endpoints( + parent=f"projects/{project_id}/branches/{branch_id}" + )) + ep_name = endpoints[0].name -@app.get("/data") -async def get_data(): - async with db_manager.session() as session: - result = await session.execute("SELECT * FROM my_table") - return result.fetchall() -``` + endpoint = w.postgres.get_endpoint(name=ep_name) + host = endpoint.status.hosts.host -### 3. Static URL Mode (Local Development) + cred = w.postgres.generate_database_credential(endpoint=ep_name) + + return psycopg.connect( + host=host, + dbname=database_name, + user=w.current_user.me().user_name, + password=cred.token, + sslmode="require", + ) + + +# Usage +with get_connection("my-app") as conn: + with conn.cursor() as cur: + cur.execute("SELECT NOW()") + print(cur.fetchone()) +``` -For local development, use a static connection URL: +## 4. Static URL (Local Development Only) ```python import os from sqlalchemy.ext.asyncio import create_async_engine -# Set environment variable with full connection URL # LAKEBASE_PG_URL=postgresql://user:password@host:5432/database def get_database_url() -> str: - """Get database URL from environment.""" - url = os.environ.get("LAKEBASE_PG_URL") - if url and url.startswith("postgresql://"): - # Convert to psycopg3 async driver + url = os.environ.get("LAKEBASE_PG_URL", "") + if url.startswith("postgresql://"): url = url.replace("postgresql://", "postgresql+psycopg://", 1) return url + engine = create_async_engine( get_database_url(), pool_size=5, - connect_args={"sslmode": "require"} + connect_args={"sslmode": "require"}, ) ``` -### 4. DNS Resolution Workaround (macOS) +## DNS Resolution Workaround (macOS) -Python's `socket.getaddrinfo()` fails with long hostnames on macOS. Use `dig` as fallback: +Python's `socket.getaddrinfo()` can fail with long hostnames on macOS. Fall back to `dig`: ```python import subprocess import socket + def resolve_hostname(hostname: str) -> str: """Resolve hostname using dig command (macOS workaround).""" try: @@ -270,10 +379,9 @@ def resolve_hostname(hostname: str) -> str: try: result = subprocess.run( ["dig", "+short", hostname], - capture_output=True, text=True, timeout=5 + capture_output=True, text=True, timeout=5, ) - ips = result.stdout.strip().split('\n') - for ip in ips: + for ip in result.stdout.strip().split('\n'): if ip and not ip.startswith(';'): return ip except Exception: @@ -281,24 +389,26 @@ def resolve_hostname(hostname: str) -> str: raise RuntimeError(f"Could not resolve hostname: {hostname}") -# Use with psycopg + +# Use with psycopg: set `host` for TLS SNI and `hostaddr` for the actual connection conn_params = { - "host": hostname, # For TLS SNI - "hostaddr": resolve_hostname(hostname), # Actual IP + "host": hostname, + "hostaddr": resolve_hostname(hostname), "dbname": database_name, "user": username, "password": token, - "sslmode": "require" + "sslmode": "require", } conn = psycopg.connect(**conn_params) ``` ## Best Practices -1. **Always use SSL**: Set `sslmode=require` in all connections -2. **Implement token refresh**: Tokens expire after 1 hour; refresh at 50 minutes -3. **Use connection pooling**: Avoid creating new connections per request -4. **Handle DNS issues on macOS**: Use the `hostaddr` workaround if needed -5. **Close connections properly**: Use context managers or explicit cleanup -6. **Handle scale-to-zero wake-up**: First connection after idle may take 2-5 seconds -7. **Log token refresh events**: Helps debug authentication issues +1. **Default to pattern 1** (`psycopg_pool.ConnectionPool` + `OAuthConnection`). It's the canonical Databricks App pattern, works out of the box, no background tasks. +2. **Use `max_lifetime=2700`, not 3600.** The default creates a race condition where connections are handed out with nearly-expired tokens. +3. **Always `sslmode=require`** on every connection (it's auto-injected as `PGSSLMODE` in Databricks Apps). +4. **Never use `config.token` / `oauth_token().access_token` as the PG password** — that's a workspace-scoped token. Use `generate_database_credential()` to mint a Lakebase-scoped one. +5. **Handle DNS issues on macOS** using the `hostaddr` workaround if your dev machine can't resolve Lakebase hostnames. +6. **Use context managers** (`with pool.connection() as conn:`) so connections are always returned to the pool. +7. **Expect 2-5 second wake-up latency** on the first query after scale-to-zero — retry with backoff. +8. **Log credential refresh events** in `OAuthConnection.connect()` during early development — makes token-related failures easy to spot. From d40d226857d7883dab9242abbca4cec1562c347f Mon Sep 17 00:00:00 2001 From: David O'Keeffe Date: Thu, 23 Apr 2026 19:20:16 +1000 Subject: [PATCH 2/3] docs(lakebase-autoscale): add "no separate Lakebase SDK" framing + cross-language table MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The existing overview jumped straight into features. Readers arriving from "how do I use Lakebase from Python?" needed two things made explicit: 1. There is no separate Lakebase SDK for Python. You use databricks-sdk only for minting OAuth credentials; a standard Postgres driver does the actual queries. (This was implicit in the connection patterns doc but not called out up-front.) 2. Node/TypeScript has a convenience wrapper: @databricks/lakebase (re-exported by @databricks/appkit). Autoscaling-only, not Provisioned. Worth mentioning so JS/TS readers know it exists. Also added a cross-language summary table and an explicit "What NOT to do" list — most importantly flagging that WorkspaceClient().config.token is workspace-scoped and will be rejected at Postgres login. This is a trap several of us have fallen into. Co-authored-by: Isaac --- .../databricks-lakebase-autoscale/SKILL.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/databricks-skills/databricks-lakebase-autoscale/SKILL.md b/databricks-skills/databricks-lakebase-autoscale/SKILL.md index f471765c..98a46181 100644 --- a/databricks-skills/databricks-lakebase-autoscale/SKILL.md +++ b/databricks-skills/databricks-lakebase-autoscale/SKILL.md @@ -20,6 +20,25 @@ Use this skill when: Lakebase Autoscaling is Databricks' next-generation managed PostgreSQL service for OLTP workloads. It provides autoscaling compute, Git-like branching, scale-to-zero, and instant point-in-time restore. +> **There is no separate "Lakebase SDK" for Python.** You use the Databricks SDK (`databricks-sdk`) **only** to mint short-lived OAuth credentials via `WorkspaceClient().postgres.generate_database_credential(...)`, then connect with a standard Postgres driver (`psycopg`, `SQLAlchemy`, JDBC, etc.). For Node/TypeScript, the convenience wrapper [`@databricks/lakebase`](https://github.com/databricks/appkit/blob/main/packages/lakebase/README.md) exists (Autoscaling only — not Provisioned). + +### Cross-language summary + +| Language | Credential SDK | DB Driver | +|----------|----------------|-----------| +| **Python** | `databricks-sdk` (`WorkspaceClient`) | `psycopg[binary,pool]` (canonical) or `SQLAlchemy` | +| **Node/TS** | `@databricks/lakebase` (handles both) | `@databricks/lakebase` wraps `pg` pool | +| **Java/Go** | Databricks SDK for Java/Go | Standard JDBC / `pgx` | + +### What NOT to do + +- ❌ Hardcode a static Postgres password +- ❌ Manually manage long-lived DB credentials +- ❌ Use `WorkspaceClient().config.token` as the Postgres password — that's a **workspace-scoped** token and will fail at Postgres login. You need the Lakebase-scoped token from `generate_database_credential()`. +- ❌ Treat Lakebase like a Databricks SQL warehouse connection (it's Postgres, not DBSQL) +- ❌ Bypass the app resource model when running inside a Databricks App + + | Feature | Description | |---------|-------------| | **Autoscaling Compute** | 0.5-112 CU with 2 GB RAM per CU; scales dynamically based on load | From dc23fb554d1743f48fbd9573263d53f74b3adcd9 Mon Sep 17 00:00:00 2001 From: David O'Keeffe Date: Mon, 27 Apr 2026 22:56:35 +1000 Subject: [PATCH 3/3] docs(lakebase-autoscale): address PR review feedback - Fix PGAPPNAME omission: 6 env vars auto-injected, not 5; note multi-resource caveat - Add psycopg3 pin comment explaining why psycopg2 won't work (no connection_class hook) - Strengthen open=False rationale: deprecated for AsyncConnectionPool, errors in psycopg 4.0 - Clarify @databricks/lakebase scope in cross-language table (Autoscaling only) Co-authored-by: Isaac --- databricks-skills/databricks-lakebase-autoscale/SKILL.md | 2 +- .../databricks-lakebase-autoscale/connection-patterns.md | 9 +++++---- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/databricks-skills/databricks-lakebase-autoscale/SKILL.md b/databricks-skills/databricks-lakebase-autoscale/SKILL.md index 98a46181..05d46d3b 100644 --- a/databricks-skills/databricks-lakebase-autoscale/SKILL.md +++ b/databricks-skills/databricks-lakebase-autoscale/SKILL.md @@ -27,7 +27,7 @@ Lakebase Autoscaling is Databricks' next-generation managed PostgreSQL service f | Language | Credential SDK | DB Driver | |----------|----------------|-----------| | **Python** | `databricks-sdk` (`WorkspaceClient`) | `psycopg[binary,pool]` (canonical) or `SQLAlchemy` | -| **Node/TS** | `@databricks/lakebase` (handles both) | `@databricks/lakebase` wraps `pg` pool | +| **Node/TS** | `@databricks/lakebase` (Autoscaling only) | `@databricks/lakebase` wraps `pg` pool | | **Java/Go** | Databricks SDK for Java/Go | Standard JDBC / `pgx` | ### What NOT to do diff --git a/databricks-skills/databricks-lakebase-autoscale/connection-patterns.md b/databricks-skills/databricks-lakebase-autoscale/connection-patterns.md index bd13c2b3..7c54be8a 100644 --- a/databricks-skills/databricks-lakebase-autoscale/connection-patterns.md +++ b/databricks-skills/databricks-lakebase-autoscale/connection-patterns.md @@ -55,8 +55,9 @@ This is the pattern from the official Databricks tutorial, external app guide, a ```yaml command: ['flask', '--app', 'app.py', 'run', '--host', '0.0.0.0', '--port', '8000'] env: - # These 5 are auto-injected when you add a Lakebase (postgres) resource in the UI: - # PGHOST, PGPORT, PGDATABASE, PGUSER, PGSSLMODE + # These 6 are auto-injected when you add a Lakebase (postgres) resource in the UI: + # PGAPPNAME, PGHOST, PGPORT, PGDATABASE, PGUSER, PGSSLMODE + # Only the *first* database resource gets auto-injected; extra resources need explicit valueFrom. # You MUST manually add ENDPOINT_NAME — it's needed by generate_database_credential(): - name: ENDPOINT_NAME value: 'projects//branches//endpoints/' @@ -66,7 +67,7 @@ env: ``` flask -psycopg[binary,pool]>=3.1.0 +psycopg[binary,pool]>=3.1.0 # psycopg3 required — psycopg2.pool has no connection_class hook for OAuthConnection databricks-sdk>=0.81.0 ``` @@ -142,7 +143,7 @@ if __name__ == '__main__': ### FastAPI variant -Identical pattern, but use `open=False` with an explicit lifespan so startup failures surface immediately: +Identical pattern, but use `open=False` with an explicit lifespan. Two reasons: (1) startup failures surface immediately via `pool.open(wait=True)`; (2) `open=True` is deprecated for `AsyncConnectionPool` and will raise an error in psycopg 4.0 — using `open=False` + lifespan is the forward-compatible pattern for any FastAPI app regardless of sync/async pool. ```python from contextlib import asynccontextmanager