Skip to content

fix: register setUp only once, eliminating O(N²) accumulation. ~2x faster tests that can scale in 1 file#175

Open
jpohhhh wants to merge 2 commits intoBetterment:mainfrom
Telosnex:main
Open

fix: register setUp only once, eliminating O(N²) accumulation. ~2x faster tests that can scale in 1 file#175
jpohhhh wants to merge 2 commits intoBetterment:mainfrom
Telosnex:main

Conversation

@jpohhhh
Copy link

@jpohhhh jpohhhh commented Mar 1, 2026

Story

  • Working on porting Mermaid chart rendering from JS to Dart.
  • Noticed when 100 tests were in one file, my 64 GB of RAM Mac ran out of memory.
  • Remembered that I had regularly seen a pattern where splitting tests into individual files, then having a test file run them, lowered RAM usage. (~14 GB in this case instead of 64)
  • Had LLM look into code b/c there was nothing obvious to me. It identified this setup being 2N^2, and provided a simple patch.
  • It didn't sound plausible to me.
  • However, it definitely worked. tests are much faster and consume much less RAM, confirmed over and over again on different suites of golden tests I have.
  • Description below produced by LLM, #s are real.
  • Before #s are: unpatched, 10 files with 10 tests. (1 file with 100 could not run)
  • After #s are: patched, 1 file with 100 tests.

Description

Each call to goldenTest() was calling goldenTestAdapter.setUp(_setUpGoldenTests), appending a new setUp callback to the current test group every time. With N golden tests, all N callbacks ran before each of the 2N variant runs, producing 2N² total setUp executions (e.g. 20,000 for 100 tests).

The fix guards the registration with a boolean so it happens exactly once.

Benchmarked against a real project (442 golden tests, 63 files, --concurrency=1):

Metric Before (0.13.0) After (fix) Improvement
Peak RAM 18.1 GB 3.2 GB 5.6×
Wall time 212 s (3:33) 127 s (2:07) 1.7×
Test execution 3:08 2:02 1.5×
User CPU 180.8 s 85.8 s 2.1×
System CPU 46.1 s 17.7 s 2.6×
Page reclaims 12.7M 4.9M 2.6×

The system CPU drop (46s → 18s) reflects kernel time spent on memory management (page faults, VM pressure) servicing the pathological allocation pattern.

A regression test is included that intercepts setUpFn/testWidgetsFn to prove the accumulation and its quadratic effect.

Type of Change

  • ✨ New feature (non-breaking change which adds functionality)
  • 🛠️ Bug fix (non-breaking change which fixes an issue)
  • ❌ Breaking change (fix or feature that would cause existing functionality to change)
  • 🧹 Code refactor
  • ✅ Build configuration change
  • 📝 Documentation
  • 🗑️ Chore

jpohhhh added 2 commits March 1, 2026 09:55
Each call to goldenTest() was calling goldenTestAdapter.setUp(_setUpGoldenTests),
appending a new setUp callback to the current test group every time. With N golden
tests, all N callbacks ran before each of the 2N variant runs, producing 2N² total
setUp executions (e.g. 20,000 for 100 tests).

The fix guards the registration with a boolean so it happens exactly once.

Benchmarked against a real project (442 golden tests, 63 files, --concurrency=1):

| Metric         | Before (0.13.0) | After (fix) | Improvement |
|----------------|-----------------|-------------|-------------|
| Peak RAM       | 18.1 GB         | 3.2 GB      | 5.6×        |
| Wall time      | 212 s (3:33)    | 127 s (2:07)| 1.7×        |
| Test execution | 3:08            | 2:02        | 1.5×        |
| User CPU       | 180.8 s         | 85.8 s      | 2.1×        |
| System CPU     | 46.1 s          | 17.7 s      | 2.6×        |
| Page reclaims  | 12.7M           | 4.9M        | 2.6×        |

The system CPU drop (46s → 18s) reflects kernel time spent on memory management
(page faults, VM pressure) servicing the pathological allocation pattern. The O(N²)
rapid-fire async setUp calls were starving the GC of idle time, preventing it from
finalizing native rendering resources (Pictures, Images, surfaces) between test runs.

A regression test is included that intercepts setUpFn/testWidgetsFn to prove the
accumulation and its quadratic effect.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant