Skip to content

OP dataset publication#388

Merged
comcon1 merged 9 commits into
NMRLipids:mainfrom
comcon1:add-dataset-workflow
May 20, 2026
Merged

OP dataset publication#388
comcon1 merged 9 commits into
NMRLipids:mainfrom
comcon1:add-dataset-workflow

Conversation

@comcon1
Copy link
Copy Markdown
Member

@comcon1 comcon1 commented Apr 19, 2026

Automatic dataset publication to NMRlipids Kaggle account.

It requires dataset-creating script to be in place. For that we have this PR: NMRLipids/FAIRMD_lipids#491
However, you can test it even without that one being merged because it refers currently to the branch. I tested it on my branch. But you need a KAGGLE secret to test it.

You can check my repo: https://github.com/comcon1/BilayerData/actions/runs/24932009452 this is a link to sucessfull run.

Also, the dataset with a defined slug must be already created. We only push the version. First dataset we must publish by ourselves. I think it's meaningful because we need to fill a lot of fields when we do so.

@comcon1 comcon1 force-pushed the add-dataset-workflow branch from 538f29a to 91a4ad9 Compare April 25, 2026 13:04
@comcon1 comcon1 force-pushed the add-dataset-workflow branch from 33a5c74 to 5fa152e Compare April 25, 2026 13:11
@comcon1 comcon1 changed the title Dataset publication OP dataset publication Apr 25, 2026
@comcon1 comcon1 marked this pull request as ready for review April 25, 2026 14:23
@comcon1 comcon1 added the enhancement New feature or request label Apr 25, 2026
@comcon1 comcon1 self-assigned this Apr 25, 2026
@comcon1 comcon1 added the github_actions Pull requests that update GitHub Actions code label Apr 26, 2026
@comcon1 comcon1 requested a review from batukav May 14, 2026 13:34
Copy link
Copy Markdown
Collaborator

@batukav batukav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The actual script call ignores the dropdown input — likely a bug.

The job-level SCRIPT_ARGS env is computed (--sims for op-sim, --exps for op-exp), but the final step hardcodes --sims:
python ${DATABANK_ROOT}/developer/gen-op-dataset.py --sims
This means selecting op-exp will still generate the sim dataset (and then push it under the experiments slug — so you'd publish sim data into the exp dataset). Should be python ${DATABANK_ROOT}/developer/gen-op-dataset.py ${SCRIPT_ARGS}. This probably explains why the test run worked: only the --sims branch was ever exercised.

  1. Kaggle authentication uses a non-standard env var

The workflow sets:
KAGGLE_API_TOKEN: ${{ secrets.KAGGLE_API_KEY }}
But the Kaggle Python client only reads KAGGLE_USERNAME + KAGGLE_KEY from env (or ~/.kaggle/kaggle.json). KAGGLE_API_TOKEN isn't recognized. Worth asking the author how auth actually resolved on the successful test run — maybe the runner had a kaggle.json from a prior step, or the secret name needs aligning. If this is meant to work from a clean upstream secret, the env vars (or a written kaggle.json file) should match what the CLI expects.

@comcon1
Copy link
Copy Markdown
Member Author

comcon1 commented May 19, 2026

  1. The actual script call ignores the dropdown input — likely a bug.

The job-level SCRIPT_ARGS env is computed (--sims for op-sim, --exps for op-exp), but the final step hardcodes --sims: python ${DATABANK_ROOT}/developer/gen-op-dataset.py --sims This means selecting op-exp will still generate the sim dataset (and then push it under the experiments slug — so you'd publish sim data into the exp dataset). Should be python ${DATABANK_ROOT}/developer/gen-op-dataset.py ${SCRIPT_ARGS}. This probably explains why the test run worked: only the --sims branch was ever exercised.

Fixed in 1ccb169

  1. Kaggle authentication uses a non-standard env var

The workflow sets: KAGGLE_API_TOKEN: ${{ secrets.KAGGLE_API_KEY }} But the Kaggle Python client only reads KAGGLE_USERNAME + KAGGLE_KEY from env (or ~/.kaggle/kaggle.json). KAGGLE_API_TOKEN isn't recognized. Worth asking the author how auth actually resolved on the successful test run — maybe the runner had a kaggle.json from a prior step, or the secret name needs aligning. If this is meant to work from a clean upstream secret, the env vars (or a written kaggle.json file) should match what the CLI expects.

It's not true. I don't have kaggle.json. I only use this environmental variable and it's seems to be enough. And I run the pipeline from my fork and it worked. So it is actually one of legal methods of using AUTH_TOKEN.

@comcon1 comcon1 requested a review from batukav May 19, 2026 17:43
@comcon1 comcon1 merged commit 0506215 into NMRLipids:main May 20, 2026
@comcon1 comcon1 deleted the add-dataset-workflow branch May 24, 2026 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request github_actions Pull requests that update GitHub Actions code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants