Skip to content

Add current_version_number to dataset status CLI output#965

Open
kaggle-agent wants to merge 1 commit intomainfrom
agent/bovard-20260410182517-e17b1b2b
Open

Add current_version_number to dataset status CLI output#965
kaggle-agent wants to merge 1 commit intomainfrom
agent/bovard-20260410182517-e17b1b2b

Conversation

@kaggle-agent
Copy link
Copy Markdown
Collaborator

Users polling kaggle datasets status after uploading large datasets
could only see the processing status but not which version was active.
Now the command also fetches dataset metadata via get_dataset() and
displays the version number alongside the status, e.g.,
"ready (version 3)".

Co-authored-by: kaggle-agent kaggle-agent@users.noreply.github.com


Task: bovard-20260410182517-e17b1b2b
Context: https://chat.kaggle.net/kaggle/pl/jh5x1ft3hjnntgtfwaco11nfmh

Users polling `kaggle datasets status` after uploading large datasets
could only see the processing status but not which version was active.
Now the command also fetches dataset metadata via `get_dataset()` and
displays the version number alongside the status, e.g.,
"ready (version 3)".

Co-authored-by: kaggle-agent <kaggle-agent@users.noreply.github.com>
dataset_response = kaggle.datasets.dataset_api_client.get_dataset(dataset_request)
current_version_number = dataset_response.current_version_number

return status, current_version_number
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rosbo This breaks backward-compatibility, doesn't it? Not for the CLI usage, but for notebooks that import the package. When I checked, there were tens of thousands of such notebooks, but I don't know how often they are used. And I don't know if any of them call dataset_status().

In general, breaking back-compat is not advised.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had marvin try to one-shot these, but lack enough context on the working of this code to know if it was a good solution. Based on my initial review it didn't look good.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Steve. To avoid breaking script that parses this output, we could add a --format flag like gcloud: https://screenshot.googleplex.com/ganNtm4RhbYHWkS

By default, we would keep the text output with only the status.

But user could specify --format=json and they would get the status AND the current version number in a json output.

Optionally, we could support field selection like gcloud in the format flag:

# If you only want a subset of fields
--format='json(current_version_number)`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants