Skip to content

Support system-level RunConfig defaults in ~/.data-designer #559

@eric-tramel

Description

@eric-tramel

Priority Level

Medium (Nice to have)

Is your feature request related to a problem? Please describe.

Data Designer already supports user-level default model/provider configuration under ~/.data-designer, but RunConfig defaults are still ephemeral. Every workflow that wants non-default runtime behavior has to repeat set_run_config(...) or wrap DataDesigner() in local helper code.

That is awkward for settings that are really user- or machine-level operational defaults rather than dataset-specific configuration, for example:

  • buffer_size
  • disable_early_shutdown
  • progress_bar / progress_interval
  • max_conversation_restarts
  • nested ThrottleConfig tuning

Today, DataDesigner.__init__ always starts from RunConfig(), so there is no system-level way to say "for this machine / this user, use these runtime defaults unless I override them explicitly for a run."

Describe the solution you'd like

Add support for persisted system-level RunConfig defaults in the existing config directory (DATA_DESIGNER_HOME, which defaults to ~/.data-designer).

Desired behavior:

  • Add a new config file at ~/.data-designer/run_config.yaml
  • When the file is present, DataDesigner() should load it and use it as the baseline runtime config
  • When the file is absent, behavior should remain exactly as it is today (RunConfig() in code)
  • Explicit per-run overrides should still win over the persisted defaults:
    1. explicit set_run_config(...) or future equivalent per-run API/CLI override
    2. persisted run_config.yaml
    3. built-in RunConfig() defaults
  • CLI preview / create / validate should inherit these persisted defaults automatically because they already instantiate DataDesigner()
  • data-designer config list and data-designer config reset should include the runtime config file
  • A future or same-issue CLI flow such as data-designer config runtime / run-config would be a natural way to create and edit it interactively

Proposed strategy for structuring runtime settings in the config directory:

  • Keep the config directory flat, with one file per concern:
    • model_providers.yaml
    • model_configs.yaml
    • mcp_providers.yaml
    • tool_configs.yaml
    • run_config.yaml
  • Do not fold runtime settings into model_configs.yaml or model_providers.yaml; that would blur the boundary between model selection and execution/runtime behavior
  • Use a nested root key rather than dumping raw fields at the top level, so the file remains extensible if metadata needs to be added later

Suggested file shape:

run_config:
  buffer_size: 1000
  disable_early_shutdown: false
  shutdown_error_rate: 0.5
  shutdown_error_window: 10
  non_inference_max_parallel_workers: 4
  max_conversation_restarts: 5
  max_conversation_correction_steps: 0
  async_trace: false
  progress_bar: false
  progress_interval: 5.0
  throttle:
    reduce_factor: 0.75
    additive_increase: 1
    success_window: 25
    cooldown_seconds: 2.0
    ceiling_overshoot: 0.10

A few design details seem important:

  • Preserve the nested throttle: object instead of flattening those fields
  • Prefer partial persistence semantics when loading, so users can store only the fields they want to override and still inherit newer built-in defaults for everything else:
run_config:
  buffer_size: 256
  progress_bar: true
  throttle:
    success_window: 50
  • Unlike the default model/provider files, run_config.yaml should probably not be auto-seeded on first use. If the file does not exist, DD should keep using code defaults. That avoids freezing old defaults into a generated file for users who never asked for persisted runtime overrides.
  • If run_config.yaml exists but is invalid, the error should be surfaced clearly rather than silently ignored, since silent fallback to RunConfig() would hide misconfiguration.

Implementation direction:

  1. Add RUN_CONFIG_FILE_NAME / RUN_CONFIG_FILE_PATH alongside the existing config-directory constants.
  2. Add a small wrapper model + repository for the new file, matching the current CLI repository pattern instead of special-casing ad hoc YAML reads.
  3. Initialize DataDesigner._run_config from that persisted file when present.
  4. Extend config list / config reset to include the runtime defaults file.
  5. Optionally add an interactive CLI workflow for editing runtime defaults.
  6. Add tests for absent file, partial file, malformed file, and precedence between persisted defaults and explicit overrides.

Describe alternatives you've considered

  • Keep requiring explicit set_run_config(...) everywhere
    • This works today, but it forces repeated boilerplate across scripts, wrappers, notebooks, and CLI entry points.
  • Add environment variables for each RunConfig field
    • This scales poorly, is hard to discover, and gets especially awkward for nested throttle settings.
  • Store runtime defaults inside model_configs.yaml or model_providers.yaml
    • This mixes separate concerns and would make those files harder to reason about.
  • Add only CLI flags without persisted defaults
    • Helpful for one-off invocations, but it does not solve the repeated per-user baseline configuration problem.

Agent Investigation

  • I searched the open issues for RunConfig, ~/.data-designer, global/system defaults, run_config.yaml, and related CLI config terms. I did not find an existing open issue asking for persisted system-level RunConfig defaults.
  • The config directory is already centralized around DATA_DESIGNER_HOME in packages/data-designer-config/src/data_designer/config/utils/constants.py and currently includes flat files such as model_configs.yaml, model_providers.yaml, mcp_providers.yaml, and tool_configs.yaml.
  • RunConfig and nested ThrottleConfig live in packages/data-designer-config/src/data_designer/config/run_config.py.
  • DataDesigner.__init__ currently sets self._run_config = RunConfig() directly, so there is no persisted runtime-default lookup today.
  • The CLI already has a reusable repository pattern (ConfigRepository, ModelRepository, ProviderRepository) that looks like the right place for a RunConfigRepository.
  • CLI preview/create/validate instantiate DataDesigner() directly, so loading persisted runtime defaults in the interface would automatically cover those commands without changing the engine call sites.
  • The engine already consumes resource_provider.run_config; this looks like a persistence/bootstrap gap in the config/interface/CLI layers rather than an engine design issue.

Additional context

This would preserve the existing separation of concerns:

  • dataset config remains per-workflow and declarative
  • RunConfig remains operational/runtime tuning
  • ~/.data-designer/run_config.yaml becomes the user-level baseline for runtime behavior, analogous to the existing default model/provider files

Non-goal:

This proposal is not asking to embed RunConfig into dataset configs or to replace explicit per-run overrides.

Checklist

  • I've reviewed existing issues and the documentation
  • This is a design proposal, not a "please build this" request

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions