Skip to content

feat: add export() method and --output-format CLI flag to DatasetCreationResults #566

@przemekboruta

Description

@przemekboruta

Summary

Currently, the only way to persist a generated dataset is via the parquet batch files written internally by the engine. Users who want a single consolidated file in a different format (JSONL, CSV, standard Parquet) have no first-class API to do so.

Proposed solution

  • Add DatasetCreationResults.export(path, format=) supporting jsonl, csv, and parquet formats
  • Add --output-format / -f flag to the data-designer create CLI command; writes dataset.<format> alongside the parquet batch files
  • Default format is jsonl; the parameter is optional in both the Python API and CLI

Usage

Python API:

results = data_designer.create(config, num_records=1000)
results.export("output.jsonl")                    # default: jsonl
results.export("output.csv", format="csv")
results.export("output.parquet", format="parquet")

CLI:

data-designer create config.yaml --output-format jsonl
data-designer create config.yaml -n 500 -f csv

Metadata

Metadata

Assignees

No one assigned

    Labels

    triagedIssue reviewed and approved by a maintainer

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions