feat(secrets): add secret handling to dataproc and google batch#184
Open
project-defiant wants to merge 6 commits into
Open
feat(secrets): add secret handling to dataproc and google batch#184project-defiant wants to merge 6 commits into
project-defiant wants to merge 6 commits into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new secret-management utility to reference GCP Secret Manager secrets and integrates secret injection into Dataproc init actions and Google Batch task specs.
Changes:
- Introduces
Secret,Secrets, andSecretInitActionmodels for validated secret references and init-action script generation. - Adds support for Batch secret variables and Dataproc cluster secret init actions driven by
secret_mapconfig. - Updates pipeline/cluster configs to wire in secret paths and secret mappings.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_secret.py | Adds unit tests for secret model validation and init-action script/GCS upload behavior. |
| src/orchestration/utils/secret.py | Implements Secret/Secrets models and Dataproc init-action script generation + GCS upload. |
| src/orchestration/utils/batch.py | Adds Batch secret_vars support to task specs; includes debug print()s. |
| src/orchestration/types.py | Extends GoogleBatchSpecs with secret_map. |
| src/orchestration/operators/dataproc.py | Adds Dataproc secret init-action preparation and cluster config patching. |
| src/orchestration/operators/batch/manifest_generators/harmonisation.py | Minor formatting change. |
| src/orchestration/operators/batch/generic.py | Passes secret_map into Batch task spec creation. |
| src/orchestration/operators/batch/finemapping.py | Attempts to pass secrets into task env creation (currently broken). |
| src/orchestration/dags/config/unified_pipeline.yaml | Bumps release/run identifiers and versions. |
| src/orchestration/dags/config/gentropy.yaml | Adds secret file paths + Batch secret_map for HF token. |
| src/orchestration/dags/config/clusters.yaml | Adds Dataproc secret_map and secret_init_action_uri for gentropy cluster. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
DSuveges
approved these changes
Apr 29, 2026
Contributor
DSuveges
left a comment
There was a problem hiding this comment.
Very sophisticated handling of secrets! I have learnt a lot from reading through this PR. I couldn't find anything off.
Collaborator
Author
|
@DSuveges, thanks for the review I will merge it once the gentropy is merged. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR attempts to fix opentargets/issues#3731
This is linked to the changes in gentropy, where I remove the google-secret-manager and handle the secrets from path or env -> opentargets/gentropy#1209
See below copilot description for the PR to see exact changes!
Tests
I had tested the 2 L2G steps that require the credentials with Orchestration and gentropy tag build on opentargets/gentropy#1209. Both steps run succesfully:
L2G train step (dataproc) -> requires W&B and HF access (succesful run)[https://console.cloud.google.com/dataproc/jobs/up-gentropy-70ae5-gentropy_l2g_training-xnmyt/monitoring?region=europe-west1&project=open-targets-eu-dev]
L2G predictions (batch) -> requires HF access (successful run)[https://console.cloud.google.com/batch/jobsDetail/regions/europe-west1/jobs/up-l2g-prediction-70ae5-job-1-20260424-124415/details?project=open-targets-eu-dev]
Follow up
Next steps would be to: