Skip to content

feat(map-review): cross-AI peer review (--cross-ai <runtime>) — slice 1 of #288#295

Merged
azalio merged 1 commit into
mainfrom
feat/288-cross-ai-review
Jun 26, 2026
Merged

feat(map-review): cross-AI peer review (--cross-ai <runtime>) — slice 1 of #288#295
azalio merged 1 commit into
mainfrom
feat/288-cross-ai-review

Conversation

@azalio

@azalio azalio commented Jun 26, 2026

Copy link
Copy Markdown
Owner

What & why

/map-review --cross-ai <runtime> dispatches the review to an independent external AI CLI (codex / gemini / claude / opencode) for a true second opinion — a different model/vendor with fresh context and no shared session. Same-model review is "inbred"; an independent reviewer catches model-specific blind spots. Slice 1 of #288 (single-runtime dispatch).

Design (llm-council-reviewed — conv 92a7f159)

  • Producer-owns-parse: all subprocess interaction, envelope parsing, finding normalization, and the untrusted boundary live in the Python step runner (run_cross_ai_review / dispatch_cross_ai_review). The skill only handles consent + presentation. Mirrors the skills_eval/dispatcher.py and --adversarial precedents.
  • Runtime adapter registry (hardcoded allowlist, not a config-driven plugin gateway): {binary, argv, envelope, independent_vendor} per runtime.
  • Double-consent egress (off by default): the per-run --cross-ai flag AND review.cross_ai.enabled: true are both required, because the diff/code leaves the machine (mirrors the SOFA opt-in posture).

Security guardrails (all enforced in Python, not prompt text)

  • Outbound secret scan blocks dispatch before the subprocess on high-confidence secrets (private keys, AWS/GitHub/Google/Slack creds); surfaces the pattern name only, never the valuestatus: secret_blocked.
  • shell=False literal-argv invocation ({prompt} token replaced wholesale — injection-proof) with a configurable timeout.
  • Inbound untrusted boundary: external output is parsed for findings but always re-emitted behind an EXTERNAL UNTRUSTED REFERENCE fence (link allowlist + injection scan, SOFA semantics) — applied deterministically in Python so the model cannot "forget" to fence. Findings are advisory-only (source: cross_ai).
  • The outbound prompt instructs the external reviewer that the diff is untrusted data, not instructions (guards against injection in the reviewed code).
  • Honest independence: same-vendor runtimes (claude reviewing a Claude session) are labeled independent_vendor: false.

Failure degradation (non-blocking)

disabled / unavailable / timeout / error / unparsed / secret_blocked all degrade gracefully and fall back to the in-session review — cross-AI is a supplement, never a hard gate.

Config

review.cross_ai.enabled: false        # org kill-switch (default off)
review.cross_ai.runtime: codex        # claude|codex|gemini|opencode
review.cross_ai.timeout_seconds: 180

Testing

  • make check green locally: ruff ✅, mypy ✅, pyright 0 errors/0 warnings/0 informations ✅, 2880 passed, check-render ✅ (generated trees == templates_src).
  • New tests: TestCrossAi* in tests/test_map_step_runner.py (wrap/detect/dispatch across success / claude-json envelope / timeout / non-zero / unparsed / secret-blocked / shell=False+literal-argv) + tests/test_cross_ai_config.py (dataclass defaults, dotted-key aliasing, validation fallbacks, default-config doc). The secret-blocked test asserts subprocess.run is not called.
  • Step runner detail moved to review-reference.md; map-review's per-skill SKILL.md line budget raised deliberately (it now hosts three review modes: normal + adversarial + cross-ai).

Note: GitHub Actions CI is currently unavailable (billing); validated via the full local gate above.

Scope

Part of #288 — single-runtime dispatch. --cross-ai all multi-runtime consensus/disagreement aggregation is a deferred follow-up slice (the Finding/source shape is designed to support it without a schema migration). Issue stays open.

Dispatch /map-review to an INDEPENDENT external AI CLI (codex/gemini/claude/
opencode) for a true second opinion — a different model/vendor with fresh
context. All subprocess interaction, envelope parsing, finding normalization,
and the untrusted boundary live in the Python step runner (run_cross_ai_review /
dispatch_cross_ai_review, producer-owns-parse); the skill only handles consent
and presentation.

Egress is double-consent: the per-run --cross-ai flag AND
review.cross_ai.enabled: true (off by default) — the diff/code leaves the
machine. Guardrails: a high-confidence outbound secret scan blocks dispatch
before the subprocess (pattern name only, never value); shell=False literal-argv
with a configurable timeout; returned findings always enter context behind an
EXTERNAL UNTRUSTED REFERENCE fence (link/injection scan, applied in Python so the
model cannot skip it) and are advisory-only (source: cross_ai); same-vendor
runtimes are honestly labeled independent_vendor: false. Any failure (disabled /
CLI missing / unauthenticated / timeout / non-JSON / secret-blocked) degrades
non-blockingly and falls back to the in-session review.

Config keys review.cross_ai.{enabled,runtime,timeout_seconds}. map-review's
per-skill SKILL.md line budget raised (three review modes: normal + adversarial
+ cross-ai); detail lives in review-reference.md. Single-runtime slice;
--cross-ai all consensus deferred to a follow-up. Design was llm-council-reviewed.

Part of #288
@azalio azalio merged commit b93989d into main Jun 26, 2026
6 checks passed
@azalio azalio deleted the feat/288-cross-ai-review branch June 26, 2026 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant