fix(ci): export MALLOC_ARENA_MAX=2 before pytest for component_integration#950
fix(ci): export MALLOC_ARENA_MAX=2 before pytest for component_integration#950ajcasagrande wants to merge 1 commit into
Conversation
…integration glibc reads MALLOC_ARENA_MAX at process startup, so setting it from inside tests/component_integration/conftest.py via os.environ.setdefault was a no-op for the running pytest worker — by then glibc had already initialized its arenas. Component_integration runs aiperf in-process (no subprocesses to inherit the var), so the only effective place to set it is the shell env before pytest starts. Prepend MALLOC_ARENA_MAX=2 to the four pytest invocations in the Makefile that target tests/component_integration/ (test-ci, test-component-integration, test-component-integration-ci, test-component-integration-verbose), and rewrite the conftest comment to reflect reality. This unbroke Ubuntu CI (3.11/3.12/3.13). macOS was unaffected because it uses a different allocator. The latent issue surfaced when #912 pulled in BoTorch/Optuna/scipy/torch and pushed the working set past what glibc's default 8×NCPU arenas could fit on 2-CPU runners, crashing xdist workers (visible as 66/83 systematic FAILED + final xdist INTERNALERROR "list.remove(x): x not in list"). Note: tests/integration/conftest.py keeps its setdefault — that suite spawns aiperf as subprocesses, so the var correctly propagates to children. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
Try out this PRQuick install: pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@2cf5b4ccc14fa941d4bcc67b4d32708a7e0d1be1Recommended with virtual environment (using uv): uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@2cf5b4ccc14fa941d4bcc67b4d32708a7e0d1be1Last updated for commit: |
WalkthroughThis PR configures component integration test targets to run pytest with the ChangesMemory Arena Configuration for Component Integration Tests
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/component_integration/conftest.py`:
- Line 31: Replace the ambiguous multiplication character in the comment that
reads "8×NCPU" with a plain ASCII 'x' so it reads "8xNCPU" to avoid Ruff RUF003;
update the comment text in the same location (the comment containing "default
8×NCPU arenas blows out RAM in 2-CPU CI runners. glibc reads this") accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e1ca39c3-7961-4879-9f90-32b364cabb94
📒 Files selected for processing (2)
Makefiletests/component_integration/conftest.py
| # integration conftest carries the same setting (gotcha 2026-04-21). | ||
| # xdist workers under heavy `-n auto` load. Component_integration runs aiperf | ||
| # in-process with full Pydantic / msgspec / tokenizer / torch imports, so the | ||
| # default 8×NCPU arenas blows out RAM in 2-CPU CI runners. glibc reads this |
There was a problem hiding this comment.
Replace ambiguous × character to avoid Ruff RUF003 warning.
Use plain x (8xNCPU) in this comment to avoid ambiguous Unicode lint warnings.
🧰 Tools
🪛 Ruff (0.15.12)
[warning] 31-31: Comment contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?
(RUF003)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/component_integration/conftest.py` at line 31, Replace the ambiguous
multiplication character in the comment that reads "8×NCPU" with a plain ASCII
'x' so it reads "8xNCPU" to avoid Ruff RUF003; update the comment text in the
same location (the comment containing "default 8×NCPU arenas blows out RAM in
2-CPU CI runners. glibc reads this") accordingly.
Duplicate approval (review lock race condition — now fixed)
Summary
app(...)in-process, ending in an xdistINTERNALERROR: list.remove(x): x not in list(the signature of a worker dying mid-run).os.environ.setdefault("MALLOC_ARENA_MAX", "2")attests/component_integration/conftest.py:33is a no-op for the running pytest worker — glibc readsMALLOC_ARENA_MAXonce at process startup, before Python imports run. Component_integration runs aiperf in-process (no subprocesses to inherit the var), so setting it from inside the conftest never actually capped arenas.Fix
Prepend
MALLOC_ARENA_MAX=2to the four pytest invocations inMakefilethat targettests/component_integration/(test-ci,test-component-integration,test-component-integration-ci,test-component-integration-verbose), so the var is in the shell env before pytest forks workers. Rewrite the misleading conftest comment to point at the Makefile.tests/integration/conftest.pyis left alone — that suite spawns aiperf as a subprocess, and the conftestsetdefaultcorrectly propagates the var to children.Test plan
MALLOC_ARENA_MAX=2 uv run pytest tests/component_integration/cli/ -m component_integration -n auto→ 15/15 passing locallymake -n test-ciandmake -n test-component-integration-cishowMALLOC_ARENA_MAX=2in the expanded pytest commandSummary by CodeRabbit