fix: register setUp only once, eliminating O(N²) accumulation. ~2x faster tests that can scale in 1 file by jpohhhh · Pull Request #175 · Betterment/alchemist

jpohhhh · 2026-03-01T22:57:28Z

Story

Working on porting Mermaid chart rendering from JS to Dart.
Noticed when 100 tests were in one file, my 64 GB of RAM Mac ran out of memory.
Remembered that I had regularly seen a pattern where splitting tests into individual files, then having a test file run them, lowered RAM usage. (~14 GB in this case instead of 64)
Had LLM look into code b/c there was nothing obvious to me. It identified this setup being 2N^2, and provided a simple patch.
It didn't sound plausible to me.
However, it definitely worked. tests are much faster and consume much less RAM, confirmed over and over again on different suites of golden tests I have.
Description below produced by LLM, #s are real.
Before #s are: unpatched, 10 files with 10 tests. (1 file with 100 could not run)
After #s are: patched, 1 file with 100 tests.

Description

Each call to goldenTest() was calling goldenTestAdapter.setUp(_setUpGoldenTests), appending a new setUp callback to the current test group every time. With N golden tests, all N callbacks ran before each of the 2N variant runs, producing 2N² total setUp executions (e.g. 20,000 for 100 tests).

The fix guards the registration with a boolean so it happens exactly once.

Benchmarked against a real project (442 golden tests, 63 files, --concurrency=1):

Metric	Before (0.13.0)	After (fix)	Improvement
Peak RAM	18.1 GB	3.2 GB	5.6×
Wall time	212 s (3:33)	127 s (2:07)	1.7×
Test execution	3:08	2:02	1.5×
User CPU	180.8 s	85.8 s	2.1×
System CPU	46.1 s	17.7 s	2.6×
Page reclaims	12.7M	4.9M	2.6×

The system CPU drop (46s → 18s) reflects kernel time spent on memory management (page faults, VM pressure) servicing the pathological allocation pattern.

A regression test is included that intercepts setUpFn/testWidgetsFn to prove the accumulation and its quadratic effect.

Type of Change

✨ New feature (non-breaking change which adds functionality)
🛠️ Bug fix (non-breaking change which fixes an issue)
❌ Breaking change (fix or feature that would cause existing functionality to change)
🧹 Code refactor
✅ Build configuration change
📝 Documentation
🗑️ Chore

Each call to goldenTest() was calling goldenTestAdapter.setUp(_setUpGoldenTests), appending a new setUp callback to the current test group every time. With N golden tests, all N callbacks ran before each of the 2N variant runs, producing 2N² total setUp executions (e.g. 20,000 for 100 tests). The fix guards the registration with a boolean so it happens exactly once. Benchmarked against a real project (442 golden tests, 63 files, --concurrency=1): | Metric | Before (0.13.0) | After (fix) | Improvement | |----------------|-----------------|-------------|-------------| | Peak RAM | 18.1 GB | 3.2 GB | 5.6× | | Wall time | 212 s (3:33) | 127 s (2:07)| 1.7× | | Test execution | 3:08 | 2:02 | 1.5× | | User CPU | 180.8 s | 85.8 s | 2.1× | | System CPU | 46.1 s | 17.7 s | 2.6× | | Page reclaims | 12.7M | 4.9M | 2.6× | The system CPU drop (46s → 18s) reflects kernel time spent on memory management (page faults, VM pressure) servicing the pathological allocation pattern. The O(N²) rapid-fire async setUp calls were starving the GC of idle time, preventing it from finalizing native rendering resources (Pictures, Images, surfaces) between test runs. A regression test is included that intercepts setUpFn/testWidgetsFn to prove the accumulation and its quadratic effect.

jpohhhh added 2 commits March 1, 2026 09:55

test: update regression test to assert fix behavior (O(N) not O(N²))

37f4866

jpohhhh requested review from a team, Kirpal, btrautmann, jeroen-meijer, jolexxa and marcossevilla as code owners March 1, 2026 22:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: register setUp only once, eliminating O(N²) accumulation. ~2x faster tests that can scale in 1 file#175

fix: register setUp only once, eliminating O(N²) accumulation. ~2x faster tests that can scale in 1 file#175
jpohhhh wants to merge 2 commits intoBetterment:mainfrom
Telosnex:main

jpohhhh commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jpohhhh commented Mar 1, 2026

Story

Description

Type of Change

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant