Summary
benchmarks/benchmark_lib.sh (function build_replay_cmd, around line 1000 of v0.2 tip) passes --cache-bust system_prefix to aiperf profile, but the current utils/aiperf submodule (70fecb2e) hardcodes first_turn_prefix as the only valid value for the inferencex-agentx-mvp scenario. The validation is fatal:
Value error, Scenario invariants violated (1 conflict):
- --cache-bust: got 'system_prefix', required 'first_turn_prefix'
(scenario 'inferencex-agentx-mvp' requires
cache_bust.target=first_turn_prefix; got system_prefix)
Pass --unsafe-override to convert to warnings (run will be marked
submission_valid=false).
This is masked for short smoke runs because build_replay_cmd auto-adds --unsafe-override when DURATION < 900. Production sweep configs use DURATION=1800 and hit the error directly — the run flips to "invalid" or fails at validation, depending on flags.
Repro
# Any cpu/noffload sweep with DURATION=1800 (the production default)
podman run ... \
-e MODEL=MiniMaxAI/MiniMax-M2.5 -e TP=4 -e CONC=16 \
-e OFFLOADING=cpu -e TOTAL_CPU_DRAM_GB=1200 \
-e DURATION=1800 \
... /workspace/benchmarks/single_node/agentic/minimaxm2.5_fp8_mi355x.sh
# vllm starts cleanly (Application startup complete, /health 200), then
# aiperf profile bails immediately with the validator error above.
Suggested fix
- REPLAY_CMD+=" --cache-bust system_prefix"
+ REPLAY_CMD+=" --cache-bust first_turn_prefix"
…or, if backwards compat with older aiperf submodules matters, source the value from the scenario plugin via aiperf scenario describe inferencex-agentx-mvp and inject it dynamically.
After this patch, the full 30-min sweep runs cleanly (verified with both OFFLOADING=none and OFFLOADING=cpu, 11 + 8 conc points respectively, 0 launcher errors).
The launcher comment block at line 950 should also be updated — it currently says:
# The scenario plugin locks: --cache-bust system_prefix,
which became stale when the scenario validator was tightened.
Environment
- Branch:
chore/agentx-v0.2-aiperf-testing (tip c8dfb585)
- aiperf submodule:
70fecb2e refactor(input): move --max-context-length filter into WekaTraceLoader
Summary
benchmarks/benchmark_lib.sh(functionbuild_replay_cmd, around line 1000 of v0.2 tip) passes--cache-bust system_prefixtoaiperf profile, but the currentutils/aiperfsubmodule (70fecb2e) hardcodesfirst_turn_prefixas the only valid value for theinferencex-agentx-mvpscenario. The validation is fatal:This is masked for short smoke runs because
build_replay_cmdauto-adds--unsafe-overridewhenDURATION < 900. Production sweep configs useDURATION=1800and hit the error directly — the run flips to "invalid" or fails at validation, depending on flags.Repro
Suggested fix
…or, if backwards compat with older aiperf submodules matters, source the value from the scenario plugin via
aiperf scenario describe inferencex-agentx-mvpand inject it dynamically.After this patch, the full 30-min sweep runs cleanly (verified with both
OFFLOADING=noneandOFFLOADING=cpu, 11 + 8 conc points respectively, 0 launcher errors).The launcher comment block at line 950 should also be updated — it currently says:
which became stale when the scenario validator was tightened.
Environment
chore/agentx-v0.2-aiperf-testing(tipc8dfb585)70fecb2e refactor(input): move --max-context-length filter into WekaTraceLoader