You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Extend the Databricks connector (flytekitplugins-spark) with OAuth Machine-to-Machine (M2M) and OIDC Workload Identity Federation authentication, the two non-legacy authentication paths Databricks now recommends for service-to-service traffic. The connector currently only supports Personal Access Tokens (PATs).
Motivation
Databricks has marked PAT a legacy auth method
Databricks now classifies workspace-level PATs as legacy in their official documentation:
90-day auto-revocation: "Databricks automatically revokes PATs that haven't been used for 90 days."
Per-workspace cap: "A user can create up to 600 PATs per workspace."
No central rotation: PATs can't be rotated through an identity provider; every secret is a separate operational chore.
Weaker auditability: PATs are tied to a single workspace user/SP, with no IdP-level lineage on each call.
Goal of this issue
Bring the Databricks connector into the modern Databricks auth posture (OAuth M2M (client-credentials) and OIDC Workload Identity Federation (token exchange)) while preserving everything that the prior multi-tenant PAT work in flyteorg/flytekit#3394 delivered:
Per-namespace identity isolation (each Flyte workflow project federates as a different Databricks Service Principal, enabling per-project Unity Catalog access controls).
Backwards-compatible fallback to existing PAT setups.
Zero workflow-code changes for adoption: operators flip auth modes in connector config, workflow authors don't change a line.
Related prior work
flyteorg/flyte#6911: Databricks Serverless Compute support (separate track, also in flight).
flyteorg/flytekit#3394: Multi-tenant Databricks PAT via cross-namespace K8s secrets (merged Mar 10, 2026). This PR is the multi-tenancy baseline that the new auth modes preserve.
flyteorg/flytekit#3392: Databricks Serverless support (merged earlier, related task-config surface).
Proposed Changes
1. Auth strategy abstraction
Introduce a small strategy module (databricks_auth.py) inside the spark plugin that owns:
Resolution of the active auth type from connector env vars or per-task config (with auto-detection when unset).
Token acquisition for each strategy (PAT, OAuth M2M, OIDC Model 1, OIDC Model 2).
Async-safe in-memory token cache with TTL and a pre-expiry refresh buffer.
Exponential backoff with jitter on token-endpoint calls.
The connector continues to call a single get_header(...) boundary; the strategy underneath is replaceable.
K8s TokenRequest minted for an annotated SA in the workflow namespace; SA carries flyte.org/databricks-client-id annotation
Per-namespace (each namespace can federate to a distinct Databricks SP)
OIDC Model 1 if no annotated SA found and Model 1 is configured; otherwise fail loudly
3. Auto-detection order
If FLYTE_DATABRICKS_AUTH_TYPE is unset, the connector picks the strongest reachable mode at submit time, in this order:
OIDC if Model 2 SA discoverable in workflow namespace, OR Model 1 prerequisites are met.
M2M if client_id/client_secret reachable (namespace secret or connector env).
PAT as the final fallback for backwards compatibility.
If FLYTE_DATABRICKS_AUTH_TYPE is set explicitly, the connector uses that mode and errors loudly when its prerequisites are missing; no silent identity downgrade.
4. OIDC Model 2 discovery (per-namespace tenancy without per-task config)
Operators label/annotate ServiceAccounts in workflow namespaces:
At submit time, the connector lists SAs in the workflow's namespace by label, picks the one carrying the flyte.org/databricks-client-id annotation, and mints a JWT for it via the Kubernetes TokenRequest API. The result is exchanged at the Databricks token endpoint for a workspace access token. Workflow authors write no extra config. Different namespaces can federate to different Databricks SPs in the same connector deployment.
Connector RBAC for Model 2 (get/list on serviceaccounts, plus create on serviceaccounts/token) is documented in the README.
5. Backwards compatibility
DatabricksJobMetadata and DatabricksV2 task config get additive fields only.
Existing PAT deployments continue to work unchanged.
Older DatabricksJobMetadata payloads (without the new fields) are still consumed correctly by upgraded connectors.
API Examples
Operators flip auth modes via connector env vars (workflow code unchanged)
# PAT (legacy, default; unchanged behaviour from #3394)# (no extra env vars needed; works as today)# OAuth M2M
FLYTE_DATABRICKS_AUTH_TYPE=oauth_m2m
FLYTE_DATABRICKS_CLIENT_ID=<sp-application-id># connector-level fallback
FLYTE_DATABRICKS_CLIENT_SECRET=<sp-secret># connector-level fallback# Plus per-namespace K8s secret `databricks-oauth` for tenant overrides.# OIDC Model 1 (connector-pod identity, shared)
FLYTE_DATABRICKS_AUTH_TYPE=oidc_federation
FLYTE_DATABRICKS_CLIENT_ID=<connector-pod-sp-id>
FLYTE_DATABRICKS_OIDC_AUDIENCE=https://...
# OIDC Model 2 (per-namespace, annotation-driven)
FLYTE_DATABRICKS_AUTH_TYPE=oidc_federation
# No connector-level client_id needed; discovered from each namespace's annotated SA.
Workflow code stays identical across all four modes
Adjusted for new auth-resolution boundary; PAT regression coverage retained
plugins/flytekit-spark/tests/test_connector.py
Updated for the additive DatabricksJobMetadata fields
plugins/flytekit-spark/README.md
New "Databricks Connector Authentication" section: env var table, four-mode walkthrough, RBAC manifests, migration guide
Testing
Unit tests
100+ test cases covering:
Auth resolution / auto-detection: each mode selected correctly when its prerequisites are met; explicit errors when the requested mode is misconfigured (no silent downgrade).
Goal: What should the final outcome look like, ideally?
The full feature has been implemented and tested end-to-end on an internal dev EKS cluster (PAT, M2M, OIDC Model 2 confirmed; Model 1 pending). I'm preparing a PR against flyteorg/flytekit:master that delivers the feature in a single coherent change. This issue is the tracking issue for that PR.
Describe alternatives you've considered
Adding only OAuth M2M: would address the legacy-PAT problem but not the auditability gap (still long-lived secrets).
A per-task databricks_oidc_service_account field on DatabricksV2: was prototyped, but it required workflow authors to know operator-level identity details and violated the zero-workflow-code-changes constraint. Replaced with the annotation-driven discovery design above.
Propose: Link/Inline OR Additional context
Inline above. PR will land shortly with the full diff, RBAC manifests, and migration guide in the spark plugin README.
Are you sure this issue hasn't been raised already?
Motivation: Why do you think this is important?
Extend the Databricks connector (
flytekitplugins-spark) with OAuth Machine-to-Machine (M2M) and OIDC Workload Identity Federation authentication, the two non-legacy authentication paths Databricks now recommends for service-to-service traffic. The connector currently only supports Personal Access Tokens (PATs).Motivation
Databricks has marked PAT a legacy auth method
Databricks now classifies workspace-level PATs as legacy in their official documentation:
Operational implications of staying on PAT:
Goal of this issue
Bring the Databricks connector into the modern Databricks auth posture (OAuth M2M (client-credentials) and OIDC Workload Identity Federation (token exchange)) while preserving everything that the prior multi-tenant PAT work in flyteorg/flytekit#3394 delivered:
Related prior work
Proposed Changes
1. Auth strategy abstraction
Introduce a small strategy module (
databricks_auth.py) inside the spark plugin that owns:The connector continues to call a single
get_header(...)boundary; the strategy underneath is replaceable.2. Auth modes
Secretdatabricks-tokenin workflow namespaceFLYTE_DATABRICKS_ACCESS_TOKENenv varSecretdatabricks-oauth(client_id+client_secret) in workflow namespaceFLYTE_DATABRICKS_CLIENT_ID/FLYTE_DATABRICKS_CLIENT_SECRETDATABRICKS_CLIENT_IDTokenRequestminted for an annotated SA in the workflow namespace; SA carriesflyte.org/databricks-client-idannotation3. Auto-detection order
If
FLYTE_DATABRICKS_AUTH_TYPEis unset, the connector picks the strongest reachable mode at submit time, in this order:client_id/client_secretreachable (namespace secret or connector env).If
FLYTE_DATABRICKS_AUTH_TYPEis set explicitly, the connector uses that mode and errors loudly when its prerequisites are missing; no silent identity downgrade.4. OIDC Model 2 discovery (per-namespace tenancy without per-task config)
Operators label/annotate ServiceAccounts in workflow namespaces:
At submit time, the connector lists SAs in the workflow's namespace by label, picks the one carrying the
flyte.org/databricks-client-idannotation, and mints a JWT for it via the KubernetesTokenRequestAPI. The result is exchanged at the Databricks token endpoint for a workspace access token. Workflow authors write no extra config. Different namespaces can federate to different Databricks SPs in the same connector deployment.Connector RBAC for Model 2 (
get/listonserviceaccounts, pluscreateonserviceaccounts/token) is documented in the README.5. Backwards compatibility
DatabricksJobMetadataandDatabricksV2task config get additive fields only.DatabricksJobMetadatapayloads (without the new fields) are still consumed correctly by upgraded connectors.API Examples
Operators flip auth modes via connector env vars (workflow code unchanged)
Workflow code stays identical across all four modes
Files Changed
plugins/flytekit-spark/flytekitplugins/spark/databricks_auth.pyplugins/flytekit-spark/flytekitplugins/spark/connector.pylist_serviceaccounts_in_k8shelper; persist discovered config inDatabricksJobMetadataplugins/flytekit-spark/flytekitplugins/spark/task.pyplugins/flytekit-spark/tests/test_databricks_auth.pyplugins/flytekit-spark/tests/test_databricks_token.pyplugins/flytekit-spark/tests/test_connector.pyDatabricksJobMetadatafieldsplugins/flytekit-spark/README.mdTesting
Unit tests
100+ test cases covering:
kubernetesandaiohttpsopyflyte runworks without K8s present.End-to-end (internal dev EKS cluster)
Migration Path
FLYTE_DATABRICKS_ACCESS_TOKENclient_id/client_secret, OR OIDC federation via per-namespace SA annotationsMigration is opt-in and additive: existing deployments keep working until operators flip
FLYTE_DATABRICKS_AUTH_TYPE.References
TokenRequestAPI: https://kubernetes.io/docs/concepts/configuration/configmap/#mounted-configmaps-are-updated-automaticallyFYI: @kumare3 @pingsutw @machichima
Goal: What should the final outcome look like, ideally?
The full feature has been implemented and tested end-to-end on an internal dev EKS cluster (PAT, M2M, OIDC Model 2 confirmed; Model 1 pending). I'm preparing a PR against
flyteorg/flytekit:masterthat delivers the feature in a single coherent change. This issue is the tracking issue for that PR.Describe alternatives you've considered
databricks_oidc_service_accountfield onDatabricksV2: was prototyped, but it required workflow authors to know operator-level identity details and violated the zero-workflow-code-changes constraint. Replaced with the annotation-driven discovery design above.Propose: Link/Inline OR Additional context
Inline above. PR will land shortly with the full diff, RBAC manifests, and migration guide in the spark plugin README.
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?