Skip to content

feat(secrets): add secret handling to dataproc and google batch#184

Open
project-defiant wants to merge 6 commits into
devfrom
gentropy-google-secret-manager
Open

feat(secrets): add secret handling to dataproc and google batch#184
project-defiant wants to merge 6 commits into
devfrom
gentropy-google-secret-manager

Conversation

@project-defiant
Copy link
Copy Markdown
Collaborator

@project-defiant project-defiant commented Apr 24, 2026

This PR attempts to fix opentargets/issues#3731

This is linked to the changes in gentropy, where I remove the google-secret-manager and handle the secrets from path or env -> opentargets/gentropy#1209

See below copilot description for the PR to see exact changes!

Tests

I had tested the 2 L2G steps that require the credentials with Orchestration and gentropy tag build on opentargets/gentropy#1209. Both steps run succesfully:

  • L2G train step (dataproc) -> requires W&B and HF access (succesful run)[https://console.cloud.google.com/dataproc/jobs/up-gentropy-70ae5-gentropy_l2g_training-xnmyt/monitoring?region=europe-west1&project=open-targets-eu-dev]

  • L2G predictions (batch) -> requires HF access (successful run)[https://console.cloud.google.com/batch/jobsDetail/regions/europe-west1/jobs/up-l2g-prediction-70ae5-job-1-20260424-124415/details?project=open-targets-eu-dev]

Follow up

Next steps would be to:

  1. refactor the PTS step to use the secret in json blob format (open-api) key
  2. Remove the secret handle from the cluster init actions
  3. Dynamically push assets (init actions) to the location custom to spark cluster, so we no longer are required to use static assets.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new secret-management utility to reference GCP Secret Manager secrets and integrates secret injection into Dataproc init actions and Google Batch task specs.

Changes:

  • Introduces Secret, Secrets, and SecretInitAction models for validated secret references and init-action script generation.
  • Adds support for Batch secret variables and Dataproc cluster secret init actions driven by secret_map config.
  • Updates pipeline/cluster configs to wire in secret paths and secret mappings.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
tests/test_secret.py Adds unit tests for secret model validation and init-action script/GCS upload behavior.
src/orchestration/utils/secret.py Implements Secret/Secrets models and Dataproc init-action script generation + GCS upload.
src/orchestration/utils/batch.py Adds Batch secret_vars support to task specs; includes debug print()s.
src/orchestration/types.py Extends GoogleBatchSpecs with secret_map.
src/orchestration/operators/dataproc.py Adds Dataproc secret init-action preparation and cluster config patching.
src/orchestration/operators/batch/manifest_generators/harmonisation.py Minor formatting change.
src/orchestration/operators/batch/generic.py Passes secret_map into Batch task spec creation.
src/orchestration/operators/batch/finemapping.py Attempts to pass secrets into task env creation (currently broken).
src/orchestration/dags/config/unified_pipeline.yaml Bumps release/run identifiers and versions.
src/orchestration/dags/config/gentropy.yaml Adds secret file paths + Batch secret_map for HF token.
src/orchestration/dags/config/clusters.yaml Adds Dataproc secret_map and secret_init_action_uri for gentropy cluster.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/orchestration/operators/dataproc.py
Comment thread src/orchestration/types.py Outdated
Comment thread src/orchestration/utils/secret.py Outdated
Comment thread src/orchestration/operators/dataproc.py Outdated
Comment thread src/orchestration/operators/dataproc.py
Comment thread src/orchestration/operators/dataproc.py
Comment thread src/orchestration/operators/dataproc.py Outdated
Comment thread src/orchestration/utils/batch.py Outdated
Comment thread src/orchestration/operators/batch/finemapping.py
@project-defiant project-defiant marked this pull request as ready for review April 24, 2026 13:57
Copy link
Copy Markdown
Contributor

@DSuveges DSuveges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very sophisticated handling of secrets! I have learnt a lot from reading through this PR. I couldn't find anything off.

@project-defiant
Copy link
Copy Markdown
Collaborator Author

@DSuveges, thanks for the review I will merge it once the gentropy is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create a step that allows for reading hugging face token from google cloud secret manager in orchestration

3 participants