azalio · azalio · Jun 27, 2026 · Jun 27, 2026
diff --git a/.agents/skills/map-efficient/SKILL.md b/.agents/skills/map-efficient/SKILL.md
@@ -246,9 +246,14 @@ run evidence with `run_flaky_test_triage` (or `record_flaky_test_triage` if the
 repeated runs were already collected) and validate
 `flaky_test_triage.json` before reporting `deferred_nondeterministic`. This is
 not a passing gate: do not weaken, skip, or delete the check, and do not return
-a silent green. After validation, close with
-`python3 .map/scripts/map_orchestrator.py defer_flaky_subtask "$SUBTASK_ID" --check-id "<check-id>"`,
-not `validate_step 2.4 --recommendation proceed`.
+a silent green. Monitor signals the defer as the third verdict outcome —
+`valid:false` plus `disposition {kind:deferred_nondeterministic, check_id}`
+(recommendation omitted or `needs_investigation`). Close via the verdict path:
+`validate_step 2.4 --disposition deferred_nondeterministic --check-id "<check-id>" --monitor-envelope -`
+(honored only when sidecar + envelope back it; deferral is `valid:false`+`deferred:true`,
+non-green, exit 0). `defer_flaky_subtask "$SUBTASK_ID" --check-id "<check-id>"`
+remains the lower-level direct close. Do not close this with
+`validate_step 2.4 --recommendation proceed`.
 
 On a clean pass, run the regression gate and record the subtask:
 

diff --git a/.agents/skills/map-efficient/efficient-reference.md b/.agents/skills/map-efficient/efficient-reference.md
@@ -44,8 +44,12 @@ python3 .map/scripts/map_step_runner.py run_flaky_test_triage \
   --timeout 120 \
   -- python -m pytest tests/test_file.py::test_name
 python3 .map/scripts/map_step_runner.py validate_flaky_test_triage
-python3 .map/scripts/map_orchestrator.py defer_flaky_subtask "$SUBTASK_ID" \
-  --check-id "pytest::test_name"
+# Preferred close — the verdict path. Monitor emits valid:false plus
+# disposition {kind: deferred_nondeterministic, check_id: ...}; close 2.4 with
+# the same disposition piped through (see "Verdict-path route" below).
+echo "$MONITOR_JSON" | python3 .map/scripts/map_orchestrator.py \
+  validate_step 2.4 --disposition deferred_nondeterministic \
+  --check-id "pytest::test_name" --monitor-envelope -
 ```
 
 The runner executes argv with `shell=False`; shell syntax is not interpreted. If
@@ -61,8 +65,6 @@ python3 .map/scripts/map_step_runner.py record_flaky_test_triage \
   --command "pytest tests/test_file.py::test_name" \
   --reason "Mixed pass/fail outcomes across repeated runs."
 python3 .map/scripts/map_step_runner.py validate_flaky_test_triage
-python3 .map/scripts/map_orchestrator.py defer_flaky_subtask "$SUBTASK_ID" \
-  --check-id "pytest::test_name"
 ```
 
 Mixed pass/fail evidence writes `.map/<branch>/flaky_test_triage.json`, updates
@@ -71,10 +73,23 @@ the `flaky_test_triage` manifest stage, and returns
 gate: do not weaken, skip, or delete the check, and do not return a silent
 green. Monitor must include the recorded defer evidence and
 `monitor_verdict_policy=not_valid_without_explicit_triage` in its finding.
-After validation, close the subtask via `defer_flaky_subtask`, not the clean-pass
-close command; the orchestrator records
-`status=deferred_nondeterministic` with evidence metadata and advances without
-requeueing Actor.
+
+**Verdict-path route (preferred).** The third Monitor outcome is wired into the
+2.4 close: `validate_step 2.4 --disposition deferred_nondeterministic --check-id
+<id> --monitor-envelope -`. The deferral is honored ONLY when (a) the Monitor
+envelope is `valid:false` with a non-empty `failed_checks` and a structured
+`disposition` matching the flags, and (b) the sidecar holds mixed pass/fail
+evidence for that `check_id` — so a deterministic failure or a green check can
+never be deferred. A deferred run returns `valid:false` + `deferred:true`
+(non-green, exit 0, not a hard-stop), records `status=deferred_nondeterministic`
+with evidence metadata, and advances without requeueing Actor. `recommendation`
+may be omitted or `needs_investigation`; `revise`/`block` are rejected as
+contradictory.
+
+**Lower-level command.** `defer_flaky_subtask "$SUBTASK_ID" --check-id <id>`
+performs the same close+advance directly (e.g. an operator deferral with no
+Monitor envelope); the verdict-path route calls it internally after the
+envelope/anti-gaming checks.
 
 ## Wave Execution
 

diff --git a/.claude/agents/monitor.md b/.claude/agents/monitor.md
@@ -1607,6 +1607,21 @@ Do NOT invent issues to justify review effort. Empty `issues` array is valid.
   `valid: false`. Do not emit `valid: true` + `recommendation: "revise"`
   — it is a contradiction that downstream workflows treat as a clean
   pass and silently skip the recommended revision.
+- **Flaky / nondeterministic check → `disposition` (the third outcome).**
+  When a check fails but repeated runs of the EXACT command show mixed
+  pass/fail (real nondeterminism, NOT a deterministic regression you can
+  reproduce on demand), do NOT demand an Actor "fix" and do NOT return a
+  silent green. Emit `valid: false` PLUS
+  `"disposition": {"kind": "deferred_nondeterministic", "check_id": "<id>"}`,
+  list the failing dimension in `failed_checks`, and set `recommendation` to
+  `needs_investigation` or omit it (NEVER `revise`/`block` — that contradicts
+  the deferral). The `check_id` MUST match the id in the
+  `.map/<branch>/flaky_test_triage.json` sidecar. The skill closes the
+  subtask via `validate_step 2.4 --disposition deferred_nondeterministic
+  --check-id <id> --monitor-envelope -`, which honors the deferral ONLY when
+  the sidecar holds mixed pass/fail evidence — so you cannot defer a
+  deterministic failure or a green check. A deferral is a recorded non-green
+  outcome, never a pass.
 
 ### JSON Schema Definition (Complete)
 
@@ -1812,6 +1827,23 @@ Do NOT invent issues to justify review effort. Empty `issues` array is valid.
           "description": "ID of next subtask to mark as in_progress (optional)"
         }
       }
+    },
+    "disposition": {
+      "type": "object",
+      "description": "OPTIONAL non-binary verdict outcome. Include ONLY when valid:false AND the failure is a CONFIRMED flaky/nondeterministic check backed by repeated-run mixed pass/fail evidence (a flaky_test_triage sidecar) — never for a deterministic regression. Omit entirely for normal verdicts. Routes the subtask to a recorded deferral (non-green, not a hard-stop retry) instead of demanding an Actor fix.",
+      "required": ["kind", "check_id"],
+      "additionalProperties": false,
+      "properties": {
+        "kind": {
+          "type": "string",
+          "enum": ["deferred_nondeterministic"],
+          "description": "The deferral kind. deferred_nondeterministic = confirmed flaky, evidence recorded, advance without retry."
+        },
+        "check_id": {
+          "type": "string",
+          "description": "The flaky check id; MUST match the check_id recorded in .map/<branch>/flaky_test_triage.json."
+        }
+      }
     }
   }
 }

diff --git a/.claude/skills/map-efficient/SKILL.md b/.claude/skills/map-efficient/SKILL.md
@@ -398,7 +398,7 @@ Return JSON with valid, summary, issues, files_changed, tests_run, and escalatio
   intentional contract rewrite; see [efficient-reference.md](efficient-reference.md).
 - If `valid=false`, write `code-review-N.md`, run `python3 .map/scripts/map_orchestrator.py monitor_failed --feedback "<feedback>"`, inspect `retry_isolation`, and invoke Predictor only when stuck/high-risk escalation rules apply. **Worktree isolation:** if enabled, run `discard_subtask_worktree "$SUBTASK_ID"` BEFORE retrying (atomic reject — a failed attempt is never merged; retry starts from a clean worktree). Recipe: [efficient-reference.md](efficient-reference.md#worktree-isolation). **If `monitor_failed` returns `status:"max_retries"` (budget exhausted), do NOT retry — run `python3 .map/scripts/map_step_runner.py build_escalation_outcome "$SUBTASK_ID" max_retries --retry-count <retry_count> --max-retries <max_retries>` and STOP with its `outcome` (surface the blocker to the user).**
 - **Intra-run failure memory + bounded-effort escalation (MANDATORY on every `valid=false`):** record the rejection with `python3 .map/scripts/map_step_runner.py record_failure_signature "<monitor feedback>" "$SUBTASK_ID"`. If `armed:true`, prepend the block from `build_anti_repeat_constraint "$SUBTASK_ID"` (add `--quarantine-active` when CLEAN_RETRY is set) to the TOP of the next Actor prompt. If `escalation_recommended:true` (#255), the 3rd identical failure means the bounded recovery act did not work — do NOT retry and do NOT run the legacy retry-3 Stuck-Recovery for this identical loop; run `python3 .map/scripts/map_step_runner.py build_escalation_outcome "$SUBTASK_ID" repeated_failure` (add `--quarantine-active` on a CLEAN_RETRY iteration) and STOP with its `outcome:"BLOCKED"`. A `status:"not_escalated"` means the latest failure was a NEW signature (the Actor moved off the dead end) — resume normal retries. Full recipe: [efficient-reference.md](efficient-reference.md).
-- If `retry_isolation=clean_retry_required`, validate `.map/<branch>/retry_quarantine.json` before CLEAN_RETRY. If a test/check fails inconsistently, collect repeated evidence with `run_flaky_test_triage ...` (or manually with `record_flaky_test_triage ...` if already collected), validate `.map/<branch>/flaky_test_triage.json`, then close via `python3 .map/scripts/map_orchestrator.py defer_flaky_subtask "$SUBTASK_ID" --check-id "<check-id>"`; this is not a passing gate and must not weaken/skip/delete the check. Full recipe: [efficient-reference.md](efficient-reference.md).
+- If `retry_isolation=clean_retry_required`, validate `.map/<branch>/retry_quarantine.json` before CLEAN_RETRY. If a test/check fails inconsistently, collect repeated evidence with `run_flaky_test_triage ...` (or manually with `record_flaky_test_triage ...` if already collected), validate `.map/<branch>/flaky_test_triage.json`. Monitor must then emit `valid:false` + `disposition {kind:deferred_nondeterministic, check_id}`; close via the verdict-path route `validate_step 2.4 --disposition deferred_nondeterministic --check-id "<check-id>" --monitor-envelope -` (honored only when sidecar + envelope back it; deferral is `valid:false`+`deferred:true`, non-green, exit 0). `defer_flaky_subtask` remains the lower-level direct close. This is not a passing gate and must not weaken/skip/delete the check. Full recipe: [efficient-reference.md](efficient-reference.md).
 - Treat test failures after Monitor approval as Monitor failure. **Cross-subtask regression gate (MANDATORY):** before the test gate, run `python3 .map/scripts/map_step_runner.py detect_cross_subtask_regression_risk "$BRANCH" "$SUBTASK_ID"`; if `recommended_gate == "full_suite"` you MUST run the FULL suite (never a `-k` subset) before commit / `record_subtask_result` — per-subtask Monitor is blind to regressions on prior subtasks' code. Recipe: [efficient-reference.md](efficient-reference.md).
 
 ### Phase: ADVANCE_SUBTASK (synthetic boundary)

diff --git a/.claude/skills/map-efficient/efficient-reference.md b/.claude/skills/map-efficient/efficient-reference.md
@@ -35,8 +35,13 @@ python3 .map/scripts/map_step_runner.py run_flaky_test_triage \
   --timeout 120 \
   -- python -m pytest tests/test_file.py::test_name
 python3 .map/scripts/map_step_runner.py validate_flaky_test_triage
-python3 .map/scripts/map_orchestrator.py defer_flaky_subtask "$SUBTASK_ID" \
-  --check-id "pytest::test_name"
+# Preferred close — the verdict-path route. Monitor emits valid:false plus
+# disposition {kind: deferred_nondeterministic, check_id: ...}; close 2.4 with
+# the same disposition piped through. The orchestrator routes to deferral ONLY
+# when the sidecar + envelope back it (see "Verdict-path route" below).
+echo "$MONITOR_JSON" | python3 .map/scripts/map_orchestrator.py \
+  validate_step 2.4 --disposition deferred_nondeterministic \
+  --check-id "pytest::test_name" --monitor-envelope -
 ```
 
 `run_flaky_test_triage` executes argv with `shell=False`; shell syntax is not
@@ -52,19 +57,30 @@ python3 .map/scripts/map_step_runner.py record_flaky_test_triage \
   --command "pytest tests/test_file.py::test_name" \
   --reason "Mixed pass/fail outcomes across repeated runs."
 python3 .map/scripts/map_step_runner.py validate_flaky_test_triage
-python3 .map/scripts/map_orchestrator.py defer_flaky_subtask "$SUBTASK_ID" \
-  --check-id "pytest::test_name"
 ```
 
 Mixed pass/fail evidence is classified as `deferred_nondeterministic` and
 stored in `.map/<branch>/flaky_test_triage.json` plus the `flaky_test_triage`
 manifest stage. This is an explicit recorded defer, not a pass: the artifact
 sets `monitor_verdict_policy=not_valid_without_explicit_triage`, and Monitor
 must still report the deferred evidence rather than returning a silent green.
-After validation, close the subtask via `defer_flaky_subtask`, not the
-clean-pass close command; it records
+
+**Verdict-path route (preferred).** The third Monitor outcome is wired into the
+2.4 close itself: `validate_step 2.4 --disposition deferred_nondeterministic
+--check-id <id> --monitor-envelope -`. The deferral is honored ONLY when (a)
+the Monitor envelope is `valid:false` with a non-empty `failed_checks` and a
+structured `disposition` matching the flags, and (b) the sidecar holds mixed
+pass/fail evidence for that `check_id` — so a deterministic failure or a green
+check can never be deferred. A deferred run returns `valid:false` +
+`deferred:true` (non-green, exit 0, not a hard-stop); it records
 `status=deferred_nondeterministic` plus evidence metadata in `step_state.json`
-and advances without requeueing Actor.
+and advances without requeueing Actor. `recommendation` may be omitted or
+`needs_investigation`; `revise`/`block` are rejected as contradictory.
+
+**Lower-level command.** `defer_flaky_subtask "$SUBTASK_ID" --check-id <id>`
+performs the same close+advance directly (used when there is no Monitor
+envelope to verify, e.g. an operator deferral); the verdict-path route above
+calls it internally after the envelope/anti-gaming checks.
 
 If a command above ever returns `Unknown function`, grep `map_step_runner.py` for `func_name ==` to confirm the dispatch branch still exists; this list is the source of truth as of the PR that added it but the underlying dispatcher is the ground truth.
 

diff --git a/.codex/agents/monitor.toml b/.codex/agents/monitor.toml
@@ -112,6 +112,19 @@ Quality Gate Enforcement:
 - If Actor trusts external input -> REJECT with security vulnerability details
 - If tests missing critical scenarios -> WARN with test case suggestions
 
+Flaky / nondeterministic check -> the third verdict outcome:
+- When a check fails but repeated runs of the EXACT command show mixed pass/fail
+  (real nondeterminism, NOT a deterministic regression), do NOT demand an Actor
+  fix and do NOT return a silent green. Emit valid: false PLUS
+  disposition: {"kind": "deferred_nondeterministic", "check_id": "<id>"}, list the
+  failing dimension in failed_checks, and set recommendation to needs_investigation
+  or omit it (never revise/block -- that contradicts the deferral). The check_id
+  MUST match the id in .map/<branch>/flaky_test_triage.json. The skill closes via
+  validate_step 2.4 --disposition deferred_nondeterministic --check-id <id>
+  --monitor-envelope -, which honors the defer ONLY when the sidecar holds mixed
+  pass/fail evidence -- a deterministic failure or a green check can never be
+  deferred. A deferral is a recorded non-green outcome, never a pass.
+
 ---
 
 # Review Process -- FOLLOW THIS ORDER