Add `--format` flag to `datasets status` for JSON output by kaggle-agent · Pull Request #972 · Kaggle/kaggle-cli

kaggle-agent · 2026-04-22T00:00:14Z

Reviewers flagged that returning a tuple from dataset_status() breaks
back-compat for the tens of thousands of notebooks that import the
package. Per @rosbo's suggestion, keep the library function returning
just the status string and expose the version number through a new
gcloud-style --format flag (json, json(current_version_number))
on the CLI instead.

Co-authored-by: kaggle-agent kaggle-agent@users.noreply.github.com

Task: bovard-20260421235151-526d73b2
Context: https://chat.kaggle.net/kaggle/pl/t8x7spahciyymqygpicoeoq69c

stevemessick · 2026-04-22T14:52:30Z

@bovard How is this related to #965?

bovard · 2026-04-24T14:46:27Z

This should replace #965

kaggle-agent · 2026-04-24T14:55:08Z

Re: @bovard's review:

Closed PR #965 with a comment pointing reviewers to #972 as the replacement. #972 addresses the back-compat concern from @stevemessick (keeps dataset_status() returning str) and implements the gcloud-style --format flag suggested by @rosbo.

rosbo · 2026-04-24T19:59:46Z

        dataset = dataset or dataset_opt
-        return self.dataset_status(dataset)
+        if format is None:
+            return self.dataset_status(dataset)


Refactor to pass the format to dataset_status to avoid duplicating the logic calling dataset_api_client.get_dataset_status.

Also, do the field filterering logic before deciding which API you should call. It is a waste to call dataset_api_client.get_dataset if you don't need the current_version_number for instance.

Refactored. Moved the format handling into dataset_status() itself so the get_dataset_status call lives in one place, and now parse --format before contacting the API so we only call the endpoint(s) needed for the requested fields:

--format=json → both get_dataset_status and get_dataset.

--format='json(status)' → only get_dataset_status.

--format='json(current_version_number)' → only get_dataset.

dataset_status_cli is now a thin wrapper that just resolves dataset_opt and forwards. Added two tests asserting we skip the unrelated API call when only one field is selected.

Already addressed in the latest revision (committed before this review pass was re-issued). dataset_status() now accepts the format parameter and owns all the API call logic — dataset_status_cli is a thin wrapper that just resolves dataset_opt and forwards. Field parsing happens before any API call, so:

--format=json calls both get_dataset_status and get_dataset.

--format='json(status)' calls only get_dataset_status.

--format='json(current_version_number)' calls only get_dataset.

Two new tests assert the unrelated API call is skipped when only one field is selected. All 14 tests pass.

@rosbo

Reviewers flagged that returning a tuple from `dataset_status()` breaks back-compat for the tens of thousands of notebooks that import the package. Per @rosbo's suggestion, keep the library function returning just the status string and expose the version number through a new gcloud-style `--format` flag (`json`, `json(current_version_number)`) on the CLI instead. Co-authored-by: kaggle-agent <kaggle-agent@users.noreply.github.com>

@rosbo

Per @rosbo's review: collapse the duplicated `get_dataset_status` call by having `dataset_status()` itself accept `format`, and parse the requested fields before hitting the network so `--format='json(status)'` no longer pays for `get_dataset` and vice versa. Co-authored-by: kaggle-agent <kaggle-agent@users.noreply.github.com>

rosbo · 2026-04-24T20:05:04Z


        Args:
            dataset_opt: an alternative to dataset
+            format: see :meth:`dataset_status`. ``None`` (default) keeps the


What does :meth: mean here?

Sorry, that's Sphinx/reStructuredText syntax for cross-referencing a method (:meth:\dataset_status`would render as a link to that method's docs in Sphinx). It's not used elsewhere in this file, so I've replaced it with plain prose:forwarded to dataset_status`.

Already addressed in the latest revision. :meth: is Sphinx/reStructuredText syntax for cross-referencing a method, but since this file doesn't use Sphinx markup elsewhere it just looks like noise. I've replaced the docstring with plain prose: optional output format forwarded to dataset_status.

rosbo · 2026-04-24T20:06:04Z

+                request.owner_slug = owner_slug
+                request.dataset_slug = dataset_slug
+                response = kaggle.datasets.dataset_api_client.get_dataset_status(request)
+                return response.status.name.lower()


if format is json and no fields are selected, you should return all the fields in json format.

Yes — that's already what happens: --format=json (no projection) returns both status and current_version_number in JSON. The if fields: branch only triggers when an explicit projection like json(status) is provided; otherwise we fall through to selected_fields = list(self._DATASET_STATUS_FIELDS) (all fields).

The line you pointed at (return response.status.name.lower()) only runs when format is None (i.e. the back-compat plain-text default). The nested branching made that hard to see, so I restructured the function — the format is None path is now a clear early return separate from the JSON path. Behavior is unchanged but the code reads top-to-bottom now.

Test test_dataset_status_cli_format_json already covers --format=json returning both fields.

Already handled in the latest revision. --format=json with no projection falls through to selected_fields = list(self._DATASET_STATUS_FIELDS) and returns both status and current_version_number in JSON. The plain-text return on line 2558 only ran when format is None (back-compat default); the nested branching obscured that, so I restructured the function to split the two paths into a clear early return for format is None and a separate JSON path below. test_dataset_status_cli_format_json already exercises bare --format=json returning both fields.

Reviewer found the reST markup confusing in a file that otherwise uses plain-prose docstrings, so swap it for a normal sentence pointing at `dataset_status`. Co-authored-by: kaggle-agent <kaggle-agent@users.noreply.github.com>

rosbo · 2026-04-24T20:08:45Z

Make sure to run the linter: hatch run lint:fmt.

Reviewer read the nested branching as if `--format=json` could fall through to the plain-text return. The behavior was already correct, but restructuring into a clear early return for `format is None` and a separate JSON path below makes that obvious without changing what the code does. Co-authored-by: kaggle-agent <kaggle-agent@users.noreply.github.com>

kaggle-agent · 2026-04-24T20:11:00Z

Ran black (24.10.0) over src/kaggle/api/kaggle_api_extended.py, src/kaggle/cli.py, and tests/test_dataset_status.py. Only the test file needed reformatting (collapsed a couple of wrapped format= kwarg lines onto one line at the project's 120-col limit). black --check is now clean for the files I touched.

Per `hatch run lint:fmt`. Co-authored-by: kaggle-agent <kaggle-agent@users.noreply.github.com>

kaggle-agent · 2026-04-24T20:11:15Z

Re: @rosbo's review:

Ran black 24.10.0 (the version pinned in pyproject.toml's lint env) over src/kaggle/api/kaggle_api_extended.py, src/kaggle/cli.py, and tests/test_dataset_status.py. Only the test file needed reformatting — black collapsed a couple of wrapped format= kwarg lines onto one line at the project's 120-col limit. black --check is now clean for the files I touched.

rosbo · 2026-04-24T20:14:57Z

-            response = kaggle.datasets.dataset_api_client.get_dataset_status(request)
-            return response.status.name.lower()
+        # Default behavior: return only the status string for backward compat.
+        if format is None:


Do not duplicate the logic calling get_dataset_status, instead you should:

set selected_fields to only status when format is None.

at line 2575 add the logic to decide whether to json.dumps if format is json or return payload['status'] as a string when format is None.

Done — collapsed into one path. format=None now sets selected_fields = ["status"] and the same loop calls get_dataset_status once. Only the return choice differs at the end:

if format is None: return payload["status"] return json.dumps({field: payload[field] for field in selected_fields})

So get_dataset_status is only spelled out once, the JSON / field-selection logic is unchanged, and back-compat still skips get_dataset since current_version_number isn't in selected_fields when format=None.

Done as suggested. format=None now sets selected_fields = ["status"], the same loop calls get_dataset_status exactly once, and the format/return decision happens after payload is populated:

if format is None: return payload["status"] return json.dumps({field: payload[field] for field in selected_fields})

So get_dataset_status is no longer duplicated, JSON / field-selection logic is unchanged, and back-compat still skips get_dataset since current_version_number isn't in selected_fields when format=None.

@rosbo

Per @rosbo: instead of two parallel paths, treat `format=None` as a projection of `["status"]`, run the same fetch loop, and only branch at the return — plain string vs. `json.dumps(...)`. Same behavior, one copy of each API call. Co-authored-by: kaggle-agent <kaggle-agent@users.noreply.github.com>

kaggle-agent assigned bovard Apr 22, 2026

bovard mentioned this pull request Apr 24, 2026

Add current_version_number to dataset status CLI output #965

Closed

stevemessick approved these changes Apr 24, 2026

View reviewed changes

bovard requested a review from rosbo April 24, 2026 17:06

rosbo reviewed Apr 24, 2026

View reviewed changes

bovard and others added 2 commits April 24, 2026 20:02

kaggle-agent force-pushed the agent/bovard-20260421235151-526d73b2 branch from f3a81b8 to a9a003d Compare April 24, 2026 20:02

rosbo reviewed Apr 24, 2026

View reviewed changes

Apply black formatting to test_dataset_status.py

5b079f6

Per `hatch run lint:fmt`. Co-authored-by: kaggle-agent <kaggle-agent@users.noreply.github.com>

rosbo reviewed Apr 24, 2026

View reviewed changes

rosbo approved these changes Apr 24, 2026

View reviewed changes

bovard merged commit a8e3758 into main Apr 24, 2026
10 checks passed

bovard deleted the agent/bovard-20260421235151-526d73b2 branch April 24, 2026 22:19

AngryAngel666 approved these changes Apr 25, 2026

View reviewed changes

Conversation

kaggle-agent commented Apr 22, 2026

Uh oh!

stevemessick commented Apr 22, 2026

Uh oh!

bovard commented Apr 24, 2026

Uh oh!

kaggle-agent commented Apr 24, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rosbo commented Apr 24, 2026

Uh oh!

kaggle-agent commented Apr 24, 2026

Uh oh!

kaggle-agent commented Apr 24, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants