OP dataset publication by comcon1 · Pull Request #388 · NMRLipids/BilayerData

comcon1 · 2026-04-19T08:15:54Z

Automatic dataset publication to NMRlipids Kaggle account.

It requires dataset-creating script to be in place. For that we have this PR: NMRLipids/FAIRMD_lipids#491
However, you can test it even without that one being merged because it refers currently to the branch. I tested it on my branch. But you need a KAGGLE secret to test it.

You can check my repo: https://github.com/comcon1/BilayerData/actions/runs/24932009452 this is a link to sucessfull run.

Also, the dataset with a defined slug must be already created. We only push the version. First dataset we must publish by ourselves. I think it's meaningful because we need to fill a lot of fields when we do so.

batukav

The actual script call ignores the dropdown input — likely a bug.

The job-level SCRIPT_ARGS env is computed (--sims for op-sim, --exps for op-exp), but the final step hardcodes --sims:
python ${DATABANK_ROOT}/developer/gen-op-dataset.py --sims
This means selecting op-exp will still generate the sim dataset (and then push it under the experiments slug — so you'd publish sim data into the exp dataset). Should be python ${DATABANK_ROOT}/developer/gen-op-dataset.py ${SCRIPT_ARGS}. This probably explains why the test run worked: only the --sims branch was ever exercised.

Kaggle authentication uses a non-standard env var

The workflow sets:
KAGGLE_API_TOKEN: ${{ secrets.KAGGLE_API_KEY }}
But the Kaggle Python client only reads KAGGLE_USERNAME + KAGGLE_KEY from env (or ~/.kaggle/kaggle.json). KAGGLE_API_TOKEN isn't recognized. Worth asking the author how auth actually resolved on the successful test run — maybe the runner had a kaggle.json from a prior step, or the secret name needs aligning. If this is meant to work from a clean upstream secret, the env vars (or a written kaggle.json file) should match what the CLI expects.

comcon1 · 2026-05-19T17:43:36Z

The actual script call ignores the dropdown input — likely a bug.

The job-level SCRIPT_ARGS env is computed (--sims for op-sim, --exps for op-exp), but the final step hardcodes --sims: python ${DATABANK_ROOT}/developer/gen-op-dataset.py --sims This means selecting op-exp will still generate the sim dataset (and then push it under the experiments slug — so you'd publish sim data into the exp dataset). Should be python ${DATABANK_ROOT}/developer/gen-op-dataset.py ${SCRIPT_ARGS}. This probably explains why the test run worked: only the --sims branch was ever exercised.

Fixed in 1ccb169

Kaggle authentication uses a non-standard env var

The workflow sets: KAGGLE_API_TOKEN: ${{ secrets.KAGGLE_API_KEY }} But the Kaggle Python client only reads KAGGLE_USERNAME + KAGGLE_KEY from env (or ~/.kaggle/kaggle.json). KAGGLE_API_TOKEN isn't recognized. Worth asking the author how auth actually resolved on the successful test run — maybe the runner had a kaggle.json from a prior step, or the secret name needs aligning. If this is meant to work from a clean upstream secret, the env vars (or a written kaggle.json file) should match what the CLI expects.

It's not true. I don't have kaggle.json. I only use this environmental variable and it's seems to be enough. And I run the pipeline from my fork and it worked. So it is actually one of legal methods of using AUTH_TOKEN.

comcon1 added 3 commits April 19, 2026 10:13

add first (non-working) pipeline larva

eb88cc3

testing version of DS workflow

59d5479

add PR trigger to test the WF

91a4ad9

comcon1 force-pushed the add-dataset-workflow branch from 538f29a to 91a4ad9 Compare April 25, 2026 13:04

point to dev repo

5fa152e

comcon1 force-pushed the add-dataset-workflow branch from 33a5c74 to 5fa152e Compare April 25, 2026 13:11

comcon1 added 3 commits April 25, 2026 15:19

fix name of secrets var

159c48d

fix databank root

29d6428

Finalize PublishDatasets action for OP datasets

e3d16c4

comcon1 requested review from MagnusSletten and korbinib April 25, 2026 14:07

comcon1 mentioned this pull request Apr 25, 2026

OP dataset generator (for use in Kaggle Dataset publishing) NMRLipids/FAIRMD_lipids#491

Merged

2 tasks

comcon1 changed the title ~~Dataset publication~~ OP dataset publication Apr 25, 2026

comcon1 marked this pull request as ready for review April 25, 2026 14:23

comcon1 added the enhancement New feature or request label Apr 25, 2026

comcon1 self-assigned this Apr 25, 2026

comcon1 added the github_actions Pull requests that update GitHub Actions code label Apr 26, 2026

comcon1 requested a review from batukav May 14, 2026 13:34

batukav reviewed May 18, 2026

View reviewed changes

comcon1 added 2 commits May 19, 2026 18:36

PublishDatasets: propagate keywords

01bd6c7

CI: fix using SCRIPT_ARGS in PublishDatasets

1ccb169

comcon1 requested a review from batukav May 19, 2026 17:43

batukav approved these changes May 20, 2026

View reviewed changes

comcon1 merged commit 0506215 into NMRLipids:main May 20, 2026

comcon1 deleted the add-dataset-workflow branch May 24, 2026 15:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OP dataset publication#388

OP dataset publication#388
comcon1 merged 9 commits into
NMRLipids:mainfrom
comcon1:add-dataset-workflow

comcon1 commented Apr 19, 2026 •

edited

Loading

Uh oh!

batukav left a comment

Uh oh!

comcon1 commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

comcon1 commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

batukav left a comment

Choose a reason for hiding this comment

Uh oh!

comcon1 commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

comcon1 commented Apr 19, 2026 •

edited

Loading