feat(map-review): cross-AI peer review (--cross-ai <runtime>) — slice 1 of #288 by azalio · Pull Request #295 · azalio/map-framework

azalio · 2026-06-26T22:40:51Z

What & why

/map-review --cross-ai <runtime> dispatches the review to an independent external AI CLI (codex / gemini / claude / opencode) for a true second opinion — a different model/vendor with fresh context and no shared session. Same-model review is "inbred"; an independent reviewer catches model-specific blind spots. Slice 1 of #288 (single-runtime dispatch).

Design (llm-council-reviewed — conv `92a7f159`)

Producer-owns-parse: all subprocess interaction, envelope parsing, finding normalization, and the untrusted boundary live in the Python step runner (run_cross_ai_review / dispatch_cross_ai_review). The skill only handles consent + presentation. Mirrors the skills_eval/dispatcher.py and --adversarial precedents.
Runtime adapter registry (hardcoded allowlist, not a config-driven plugin gateway): {binary, argv, envelope, independent_vendor} per runtime.
Double-consent egress (off by default): the per-run --cross-ai flag AND review.cross_ai.enabled: true are both required, because the diff/code leaves the machine (mirrors the SOFA opt-in posture).

Security guardrails (all enforced in Python, not prompt text)

Outbound secret scan blocks dispatch before the subprocess on high-confidence secrets (private keys, AWS/GitHub/Google/Slack creds); surfaces the pattern name only, never the value → status: secret_blocked.
shell=False literal-argv invocation ({prompt} token replaced wholesale — injection-proof) with a configurable timeout.
Inbound untrusted boundary: external output is parsed for findings but always re-emitted behind an EXTERNAL UNTRUSTED REFERENCE fence (link allowlist + injection scan, SOFA semantics) — applied deterministically in Python so the model cannot "forget" to fence. Findings are advisory-only (source: cross_ai).
The outbound prompt instructs the external reviewer that the diff is untrusted data, not instructions (guards against injection in the reviewed code).
Honest independence: same-vendor runtimes (claude reviewing a Claude session) are labeled independent_vendor: false.

Failure degradation (non-blocking)

disabled / unavailable / timeout / error / unparsed / secret_blocked all degrade gracefully and fall back to the in-session review — cross-AI is a supplement, never a hard gate.

Config

review.cross_ai.enabled: false        # org kill-switch (default off)
review.cross_ai.runtime: codex        # claude|codex|gemini|opencode
review.cross_ai.timeout_seconds: 180

Testing

make check green locally: ruff ✅, mypy ✅, pyright 0 errors/0 warnings/0 informations ✅, 2880 passed, check-render ✅ (generated trees == templates_src).
New tests: TestCrossAi* in tests/test_map_step_runner.py (wrap/detect/dispatch across success / claude-json envelope / timeout / non-zero / unparsed / secret-blocked / shell=False+literal-argv) + tests/test_cross_ai_config.py (dataclass defaults, dotted-key aliasing, validation fallbacks, default-config doc). The secret-blocked test asserts subprocess.run is not called.
Step runner detail moved to review-reference.md; map-review's per-skill SKILL.md line budget raised deliberately (it now hosts three review modes: normal + adversarial + cross-ai).

Note: GitHub Actions CI is currently unavailable (billing); validated via the full local gate above.

Scope

Part of #288 — single-runtime dispatch. --cross-ai all multi-runtime consensus/disagreement aggregation is a deferred follow-up slice (the Finding/source shape is designed to support it without a schema migration). Issue stays open.

Dispatch /map-review to an INDEPENDENT external AI CLI (codex/gemini/claude/ opencode) for a true second opinion — a different model/vendor with fresh context. All subprocess interaction, envelope parsing, finding normalization, and the untrusted boundary live in the Python step runner (run_cross_ai_review / dispatch_cross_ai_review, producer-owns-parse); the skill only handles consent and presentation. Egress is double-consent: the per-run --cross-ai flag AND review.cross_ai.enabled: true (off by default) — the diff/code leaves the machine. Guardrails: a high-confidence outbound secret scan blocks dispatch before the subprocess (pattern name only, never value); shell=False literal-argv with a configurable timeout; returned findings always enter context behind an EXTERNAL UNTRUSTED REFERENCE fence (link/injection scan, applied in Python so the model cannot skip it) and are advisory-only (source: cross_ai); same-vendor runtimes are honestly labeled independent_vendor: false. Any failure (disabled / CLI missing / unauthenticated / timeout / non-JSON / secret-blocked) degrades non-blockingly and falls back to the in-session review. Config keys review.cross_ai.{enabled,runtime,timeout_seconds}. map-review's per-skill SKILL.md line budget raised (three review modes: normal + adversarial + cross-ai); detail lives in review-reference.md. Single-runtime slice; --cross-ai all consensus deferred to a follow-up. Design was llm-council-reviewed. Part of #288

azalio merged commit b93989d into main Jun 26, 2026
6 checks passed

azalio deleted the feat/288-cross-ai-review branch June 26, 2026 22:43

azalio mentioned this pull request Jun 26, 2026

GSD-style cross-AI peer review: dispatch review to independent AI CLI #288

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(map-review): cross-AI peer review (--cross-ai <runtime>) — slice 1 of #288#295

feat(map-review): cross-AI peer review (--cross-ai <runtime>) — slice 1 of #288#295
azalio merged 1 commit into
mainfrom
feat/288-cross-ai-review

azalio commented Jun 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

azalio commented Jun 26, 2026

What & why

Design (llm-council-reviewed — conv 92a7f159)

Security guardrails (all enforced in Python, not prompt text)

Failure degradation (non-blocking)

Config

Testing

Scope

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Design (llm-council-reviewed — conv `92a7f159`)