Priority Level
Medium (Nice to have)
Is your feature request related to a problem? Please describe.
Data Designer already supports user-level default model/provider configuration under ~/.data-designer, but RunConfig defaults are still ephemeral. Every workflow that wants non-default runtime behavior has to repeat set_run_config(...) or wrap DataDesigner() in local helper code.
That is awkward for settings that are really user- or machine-level operational defaults rather than dataset-specific configuration, for example:
buffer_size
disable_early_shutdown
progress_bar / progress_interval
max_conversation_restarts
- nested
ThrottleConfig tuning
Today, DataDesigner.__init__ always starts from RunConfig(), so there is no system-level way to say "for this machine / this user, use these runtime defaults unless I override them explicitly for a run."
Describe the solution you'd like
Add support for persisted system-level RunConfig defaults in the existing config directory (DATA_DESIGNER_HOME, which defaults to ~/.data-designer).
Desired behavior:
- Add a new config file at
~/.data-designer/run_config.yaml
- When the file is present,
DataDesigner() should load it and use it as the baseline runtime config
- When the file is absent, behavior should remain exactly as it is today (
RunConfig() in code)
- Explicit per-run overrides should still win over the persisted defaults:
- explicit
set_run_config(...) or future equivalent per-run API/CLI override
- persisted
run_config.yaml
- built-in
RunConfig() defaults
- CLI
preview / create / validate should inherit these persisted defaults automatically because they already instantiate DataDesigner()
data-designer config list and data-designer config reset should include the runtime config file
- A future or same-issue CLI flow such as
data-designer config runtime / run-config would be a natural way to create and edit it interactively
Proposed strategy for structuring runtime settings in the config directory:
- Keep the config directory flat, with one file per concern:
model_providers.yaml
model_configs.yaml
mcp_providers.yaml
tool_configs.yaml
run_config.yaml
- Do not fold runtime settings into
model_configs.yaml or model_providers.yaml; that would blur the boundary between model selection and execution/runtime behavior
- Use a nested root key rather than dumping raw fields at the top level, so the file remains extensible if metadata needs to be added later
Suggested file shape:
run_config:
buffer_size: 1000
disable_early_shutdown: false
shutdown_error_rate: 0.5
shutdown_error_window: 10
non_inference_max_parallel_workers: 4
max_conversation_restarts: 5
max_conversation_correction_steps: 0
async_trace: false
progress_bar: false
progress_interval: 5.0
throttle:
reduce_factor: 0.75
additive_increase: 1
success_window: 25
cooldown_seconds: 2.0
ceiling_overshoot: 0.10
A few design details seem important:
- Preserve the nested
throttle: object instead of flattening those fields
- Prefer partial persistence semantics when loading, so users can store only the fields they want to override and still inherit newer built-in defaults for everything else:
run_config:
buffer_size: 256
progress_bar: true
throttle:
success_window: 50
- Unlike the default model/provider files,
run_config.yaml should probably not be auto-seeded on first use. If the file does not exist, DD should keep using code defaults. That avoids freezing old defaults into a generated file for users who never asked for persisted runtime overrides.
- If
run_config.yaml exists but is invalid, the error should be surfaced clearly rather than silently ignored, since silent fallback to RunConfig() would hide misconfiguration.
Implementation direction:
- Add
RUN_CONFIG_FILE_NAME / RUN_CONFIG_FILE_PATH alongside the existing config-directory constants.
- Add a small wrapper model + repository for the new file, matching the current CLI repository pattern instead of special-casing ad hoc YAML reads.
- Initialize
DataDesigner._run_config from that persisted file when present.
- Extend
config list / config reset to include the runtime defaults file.
- Optionally add an interactive CLI workflow for editing runtime defaults.
- Add tests for absent file, partial file, malformed file, and precedence between persisted defaults and explicit overrides.
Describe alternatives you've considered
- Keep requiring explicit
set_run_config(...) everywhere
- This works today, but it forces repeated boilerplate across scripts, wrappers, notebooks, and CLI entry points.
- Add environment variables for each
RunConfig field
- This scales poorly, is hard to discover, and gets especially awkward for nested throttle settings.
- Store runtime defaults inside
model_configs.yaml or model_providers.yaml
- This mixes separate concerns and would make those files harder to reason about.
- Add only CLI flags without persisted defaults
- Helpful for one-off invocations, but it does not solve the repeated per-user baseline configuration problem.
Agent Investigation
- I searched the open issues for
RunConfig, ~/.data-designer, global/system defaults, run_config.yaml, and related CLI config terms. I did not find an existing open issue asking for persisted system-level RunConfig defaults.
- The config directory is already centralized around
DATA_DESIGNER_HOME in packages/data-designer-config/src/data_designer/config/utils/constants.py and currently includes flat files such as model_configs.yaml, model_providers.yaml, mcp_providers.yaml, and tool_configs.yaml.
RunConfig and nested ThrottleConfig live in packages/data-designer-config/src/data_designer/config/run_config.py.
DataDesigner.__init__ currently sets self._run_config = RunConfig() directly, so there is no persisted runtime-default lookup today.
- The CLI already has a reusable repository pattern (
ConfigRepository, ModelRepository, ProviderRepository) that looks like the right place for a RunConfigRepository.
- CLI preview/create/validate instantiate
DataDesigner() directly, so loading persisted runtime defaults in the interface would automatically cover those commands without changing the engine call sites.
- The engine already consumes
resource_provider.run_config; this looks like a persistence/bootstrap gap in the config/interface/CLI layers rather than an engine design issue.
Additional context
This would preserve the existing separation of concerns:
- dataset config remains per-workflow and declarative
RunConfig remains operational/runtime tuning
~/.data-designer/run_config.yaml becomes the user-level baseline for runtime behavior, analogous to the existing default model/provider files
Non-goal:
This proposal is not asking to embed RunConfig into dataset configs or to replace explicit per-run overrides.
Checklist
Priority Level
Medium (Nice to have)
Is your feature request related to a problem? Please describe.
Data Designer already supports user-level default model/provider configuration under
~/.data-designer, butRunConfigdefaults are still ephemeral. Every workflow that wants non-default runtime behavior has to repeatset_run_config(...)or wrapDataDesigner()in local helper code.That is awkward for settings that are really user- or machine-level operational defaults rather than dataset-specific configuration, for example:
buffer_sizedisable_early_shutdownprogress_bar/progress_intervalmax_conversation_restartsThrottleConfigtuningToday,
DataDesigner.__init__always starts fromRunConfig(), so there is no system-level way to say "for this machine / this user, use these runtime defaults unless I override them explicitly for a run."Describe the solution you'd like
Add support for persisted system-level
RunConfigdefaults in the existing config directory (DATA_DESIGNER_HOME, which defaults to~/.data-designer).Desired behavior:
~/.data-designer/run_config.yamlDataDesigner()should load it and use it as the baseline runtime configRunConfig()in code)set_run_config(...)or future equivalent per-run API/CLI overriderun_config.yamlRunConfig()defaultspreview/create/validateshould inherit these persisted defaults automatically because they already instantiateDataDesigner()data-designer config listanddata-designer config resetshould include the runtime config filedata-designer config runtime/run-configwould be a natural way to create and edit it interactivelyProposed strategy for structuring runtime settings in the config directory:
model_providers.yamlmodel_configs.yamlmcp_providers.yamltool_configs.yamlrun_config.yamlmodel_configs.yamlormodel_providers.yaml; that would blur the boundary between model selection and execution/runtime behaviorSuggested file shape:
A few design details seem important:
throttle:object instead of flattening those fieldsrun_config.yamlshould probably not be auto-seeded on first use. If the file does not exist, DD should keep using code defaults. That avoids freezing old defaults into a generated file for users who never asked for persisted runtime overrides.run_config.yamlexists but is invalid, the error should be surfaced clearly rather than silently ignored, since silent fallback toRunConfig()would hide misconfiguration.Implementation direction:
RUN_CONFIG_FILE_NAME/RUN_CONFIG_FILE_PATHalongside the existing config-directory constants.DataDesigner._run_configfrom that persisted file when present.config list/config resetto include the runtime defaults file.Describe alternatives you've considered
set_run_config(...)everywhereRunConfigfieldmodel_configs.yamlormodel_providers.yamlAgent Investigation
RunConfig,~/.data-designer, global/system defaults,run_config.yaml, and related CLI config terms. I did not find an existing open issue asking for persisted system-levelRunConfigdefaults.DATA_DESIGNER_HOMEinpackages/data-designer-config/src/data_designer/config/utils/constants.pyand currently includes flat files such asmodel_configs.yaml,model_providers.yaml,mcp_providers.yaml, andtool_configs.yaml.RunConfigand nestedThrottleConfiglive inpackages/data-designer-config/src/data_designer/config/run_config.py.DataDesigner.__init__currently setsself._run_config = RunConfig()directly, so there is no persisted runtime-default lookup today.ConfigRepository,ModelRepository,ProviderRepository) that looks like the right place for aRunConfigRepository.DataDesigner()directly, so loading persisted runtime defaults in the interface would automatically cover those commands without changing the engine call sites.resource_provider.run_config; this looks like a persistence/bootstrap gap in the config/interface/CLI layers rather than an engine design issue.Additional context
This would preserve the existing separation of concerns:
RunConfigremains operational/runtime tuning~/.data-designer/run_config.yamlbecomes the user-level baseline for runtime behavior, analogous to the existing default model/provider filesNon-goal:
This proposal is not asking to embed
RunConfiginto dataset configs or to replace explicit per-run overrides.Checklist