Source loop: flaky-test-triage (loops.elorm.xyz)
What the loop does
Re-runs a failing suite N times, classifies each failure as flaky vs real, and fixes only confirmed regressions. Exit condition is non-binary: "every failure classified and real regressions fixed or explicitly deferred."
Why valuable for MAP
MAP currently treats a test/check as a single-pass deterministic verdict — one red run either hard-stops the workflow (Monitor valid=false) or, worse, pressures the Actor to "fix" a nondeterministic failure that isn't real. Neither path handles flakiness. Two additive ideas:
- Repetition-as-signal — re-run a failing test K times to separate non-deterministic noise from a deterministic regression before acting.
- A
deferred (nondeterministic) disposition — a third Monitor outcome besides pass / hard-stop, with a recorded reason, so a confirmed flake doesn't force a false-green and doesn't block the run.
What MAP has today
- Deterministic gates + Monitor binary
valid verdict.
- No flakiness classifier, no repeat-N-times, no defer-with-reason disposition.
Proposed scope
- Add an opt-in "re-run failing test K times" step in the test-gate path (config knob, default off or K=1).
- Extend the Monitor/gate verdict schema with a
deferred disposition carrying reason + evidence (K runs, P pass / Q fail).
- Document: a flake must be classified and recorded, never silently skipped or test-weakened (reuse existing anti-gaming guardrails).
Notes
Single-source render invariant applies (edit .jinja, make render-templates). Honors "fix every surfaced error" — defer is recorded, not a silent skip.
Part of #251
Source loop:
flaky-test-triage(loops.elorm.xyz)What the loop does
Re-runs a failing suite N times, classifies each failure as flaky vs real, and fixes only confirmed regressions. Exit condition is non-binary: "every failure classified and real regressions fixed or explicitly deferred."
Why valuable for MAP
MAP currently treats a test/check as a single-pass deterministic verdict — one red run either hard-stops the workflow (Monitor
valid=false) or, worse, pressures the Actor to "fix" a nondeterministic failure that isn't real. Neither path handles flakiness. Two additive ideas:deferred (nondeterministic)disposition — a third Monitor outcome besides pass / hard-stop, with a recorded reason, so a confirmed flake doesn't force a false-green and doesn't block the run.What MAP has today
validverdict.Proposed scope
deferreddisposition carryingreason+evidence(K runs, P pass / Q fail).Notes
Single-source render invariant applies (edit
.jinja,make render-templates). Honors "fix every surfaced error" — defer is recorded, not a silent skip.Part of #251