Running training with --open doesn't seem to be recognized

When I try to run a training workload with `--open`, mlpstorage still says `Running the benchmark without verification for open or closed configurations. These results are not valid for submission. Use --open or --closed to specify a configuration.`

I also get a crash later on:

```
dslik@frosta:~/scratch/mlp3/storage$ ./mlpstorage training run --open --hosts 127.0.0.1 --num-client-hosts 1 --client-host-memory-in-gb 128 --num-accelerators 2 --accelerator-type b200 --model retinanet --param dataset.num_files_train=425000 --file --data-dir /home/dslik/scratch --results-dir /home/dslik/scratch --allow-run-as-root
Setting attr from num_accelerators to 2
Hosts is: ['127.0.0.1']
Hosts is: ['127.0.0.1']
⠋ Validating environment... 0:00:002026-04-22 23:24:12|INFO: Environment validation passed
2026-04-22 23:24:12|STATUS: Benchmark results directory: /home/dslik/scratch/training/retinanet/run/20260422_232412
2026-04-22 23:24:12|INFO: Created benchmark run: training_run_retinanet_20260422_232412
2026-04-22 23:24:12|STATUS: Verifying benchmark run for training_run_retinanet_20260422_232412
2026-04-22 23:24:12|RESULT: Minimum file count dictated by dataset size to memory size ratio.
2026-04-22 23:24:12|STATUS: Closed: [CLOSED] Closed parameter override allowed: dataset.num_files_train = 425000 (Parameter: Overrode Parameters)
2026-04-22 23:24:12|ERROR: INVALID: [INVALID] Insufficient number of training files (Parameter: dataset.num_files_train, Expected: >= 16550423, Actual: 425000)
2026-04-22 23:24:12|STATUS: Benchmark run is INVALID due to 1 issues ([RunID(program='training', command='run', model='retinanet', run_datetime='20260422_232412')])
2026-04-22 23:24:12|WARNING: Running the benchmark without verification for open or closed configurations. These results are not valid for submission. Use --open or --closed to specify a configuration.
```

snip

```
--------------------------------------------------------------------------
[OUTPUT] 2026-04-22T23:24:15.550253 Running DLIO [Training] with 2 process(es)
[WARNING] The amount of dataset is smaller than the host memory; data might be cached after the first epoch. Increase the size of 
dataset to eliminate the caching effect!!!
Error executing job with overrides: ['workload=retinanet_b200', '++workload.dataset.num_files_train=425000', 
'++workload.dataset.data_folder=/home/dslik/scratch/retinanet']
Error executing job with overrides: ['workload=retinanet_b200', '++workload.dataset.num_files_train=425000', 
'++workload.dataset.data_folder=/home/dslik/scratch/retinanet']
Traceback (most recent call last):
  File "/home/dslik/scratch/mlp3/storage/.venv/lib/python3.12/site-packages/dlio_benchmark/main.py", line 482, in run_benchmark
    benchmark.initialize()
  File "/home/dslik/scratch/mlp3/storage/.venv/lib/python3.12/site-packages/dftracer/python/common.py", line 504, in wrapper
    x = f(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^
  File "/home/dslik/scratch/mlp3/storage/.venv/lib/python3.12/site-packages/dlio_benchmark/main.py", line 206, in initialize
    assert (num_subfolders == len(filenames))
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running training with --open doesn't seem to be recognized #349

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Running training with --open doesn't seem to be recognized #349

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions