Claude Code skill for adversarial AI code and plan review.
One AI writes the code. Another tears it apart. Iterate until approved.
Most AI code review tools validate your changes — "looks good, maybe add tests." Adversarial review does the opposite: the reviewer defaults to skepticism and tries to break confidence in the change. It looks for what will fail in production, not what might be nice to improve.
This is a Claude Code skill
— a SKILL.md file plus a small references/runner.md that together
teach Claude how to run adversarial reviews through an external AI model
(currently OpenAI Codex).
- Plan review — review the plan BEFORE writing code. Catch architecture mistakes, missing steps, and risks early
- Code review — review the implementation. Bugs, security, data loss
- Code-vs-plan — verify the implementation matches the plan
- Iterative — Claude fixes issues based on reviewer feedback and resubmits for re-review. Up to 5 rounds until approved
- Lightweight —
SKILL.md+ onereferences/runner.md(thin runner subagent spec), no server, no broker, no external runtime deps. Compare with codex-plugin-cc: ~15 JS modules, App Server, JSON-RPC broker, lifecycle hooks
┌─────────┐ ┌──────────┐ ┌─────────┐
│ Claude │────>│ Reviewer │────>│ Claude │
│ (code) │ │ (Codex) │ │ (fix) │
└─────────┘ └──────────┘ └─────────┘
^ │
│ ┌──────────┐ │
└─────────│ Reviewer │<───────────┘
│(re-review)│
└──────────┘
│
VERDICT: APPROVED
| Mode | What it reviews | When to use |
|---|---|---|
plan |
Implementation plan | Before writing code |
code |
Git diff (unstaged, staged, or branch) | After writing code |
code-vs-plan |
Code changes against the plan | Verify implementation matches plan |
Mode is auto-detected from context, or you can force it with an argument.
Claude Code and OpenAI Codex CLI must be installed.
Verify both are available:
claude --version # Claude Code CLI
codex --version # OpenAI Codex CLI (>= 0.115.0)If Codex is missing: npm install -g @openai/codex
Authentication. Codex needs an OpenAI account. Either:
- Sign in interactively:
codex(opens browser) - Or set
CODEX_API_KEYenv var for non-interactive use
Codex CLI version. The skill targets Codex CLI ≥ 0.132.0. From 0.132 onward the top-level -a / --ask-for-approval flag is gone from codex exec; approval policy is expressed only via -c approval_policy=.... The skill emits the -c form unconditionally. If you are stuck on an older Codex CLI, upgrade before using the refreshed defaults.
Linux sandbox prerequisites. The default sandbox (workspace-write) relies on bubblewrap (bwrap) and unprivileged user namespaces. See the Linux sandbox prerequisites section below — Ubuntu 24.04+ requires installing an AppArmor profile.
git clone https://github.com/dementev-dev/adversarial-review.git
mkdir -p ~/.claude/skills
ln -sfn "$(pwd)/adversarial-review" ~/.claude/skills/adversarial-reviewVerify both the skill entry-point AND the runner subagent spec are in place:
ls -la ~/.claude/skills/adversarial-review/SKILL.md
ls -la ~/.claude/skills/adversarial-review/references/runner.mdMigrating from a previous install at
~/.agents/skills/: delete the old symlink (rm ~/.agents/skills/adversarial-review) and install at the new path above. Claude Code ≥ 2.x uses~/.claude/skills/.
The skill runs git, codex exec, and writes temp files to /tmp.
Without pre-approved permissions, Claude Code will prompt for each action.
Where to add. Since the skill is installed globally
(~/.claude/skills/), permissions should go into the global config
so they work in any project:
| Install scope | Config file |
|---|---|
| Global (recommended) | ~/.claude/settings.json |
| Single project | <project>/.claude/settings.local.json |
Merge the following rules into the permissions.allow array of the
chosen config file:
Full example (if the config file is empty or does not exist)
{
"permissions": {
"allow": [
// adversarial-review
"Bash(git diff*)",
"Bash(git status*)",
"Bash(git -C * status --porcelain*)",
"Bash(git symbolic-ref*)",
"Bash(git rev-parse*)",
"Bash(cat /tmp/codex-prompt-* | timeout 600 codex exec *)",
"Bash(cd * && cat /tmp/codex-resume-prompt-* | timeout 600 codex exec resume *)",
"Bash(find ~/.codex/sessions*)",
"Bash(ls -t ~/.codex/sessions*)",
"Bash(bwrap --dev-bind / / --unshare-net /bin/echo ok*)",
"Bash(sha256sum /tmp/codex-*)",
"Bash(diff -q /tmp/codex-*)",
"Write(/tmp/codex-git-pre-*)",
"Write(/tmp/codex-git-post-*)",
"Write(/tmp/codex-inputs-pre-*)",
"Write(/tmp/codex-inputs-post-*)",
"Read(/tmp/codex-git-pre-*)",
"Read(/tmp/codex-git-post-*)",
"Read(/tmp/codex-inputs-pre-*)",
"Read(/tmp/codex-inputs-post-*)",
"Write(/tmp/codex-plan-*)",
"Write(/tmp/codex-prompt-*)",
"Write(/tmp/codex-resume-prompt-*)",
"Read(/tmp/codex-review-*)",
"Read(/tmp/codex-stdout-*)",
"Read(/tmp/codex-stderr-*)",
"Bash(mv /tmp/codex-stdout-* /tmp/codex-stdout-*-failed-resume.jsonl)",
"Bash(mv /tmp/codex-stderr-* /tmp/codex-stderr-*-failed-resume.txt)",
"Bash(rm -f /tmp/codex-*)",
"Write(/tmp/codex-body-*)",
"Write(/tmp/codex-resume-body-*)",
"Read(/tmp/codex-runner-result-*)",
"Read(/tmp/codex-body-*)",
"Read(/tmp/codex-resume-body-*)",
"Write(/tmp/codex-runner-result-*)",
"Bash(ls ~/.claude/skills/adversarial-review/references/runner.md*)",
"Bash(ls ~/.claude/plugins/cache/*/*/*/skills/adversarial-review/references/runner.md*)"
]
}
}Security note: The codex exec rule allows any codex exec invocation
wrapped in timeout 600. The default sandbox is workspace-write (the reviewer
needs to run tests, builds, and CLI introspection to verify findings — read-only
blocks all of that). Reviewer-side mutation is governed by the prompt-level
"auditor, not contributor" contract plus pre/post git status --porcelain and
sha256sum snapshots on every dispatch. If you prefer tighter control, see the
Safety considerations section below for the
sandbox:read-only opt-out and the worktree-isolation pattern.
/adversarial-review # auto-detect mode
/adversarial-review plan # force plan review
/adversarial-review code # force code review
/adversarial-review path/to/f # review a specific file
/adversarial-review xhigh # higher reasoning effort
/adversarial-review model:gpt-5.4 # use a different model
/adversarial-review sandbox:read-only # block reviewer writes (loses empirical verification)
/adversarial-review approvals:never # boundary crossings fail instead of askingOverrides can be combined: /adversarial-review plan xhigh sandbox:read-only.
| Setting | Default | Override |
|---|---|---|
| Reviewer model | gpt-5.5 |
model:<name> |
| Reasoning effort | high |
low / medium / high / xhigh |
| Codex sandbox | workspace-write |
sandbox:read-only / sandbox:workspace-write / sandbox:danger-full-access / sandbox:inherit |
| Approval policy | on-request with auto_review reviewer |
approvals:user / approvals:auto_review / approvals:never |
| Max rounds | 5 |
not configurable |
| Operator language | auto-detected from recent messages, fallback English | not configurable |
The skill uses XML-structured prompts with adversarial stance:
<role>— adversarial reviewer, defaults to skepticism<operating_stance>— break confidence, not validate<attack_surface>— concrete checklist: auth, data integrity, race conditions, rollback safety, schema drift, error handling, observability<finding_bar>— every finding must answer 4 questions: what can go wrong, why vulnerable, impact, recommendation<scope_exclusions>— no style, naming, or speculative comments<calibration>— one strong finding > five weak ones
See examples/review-output.md for a sample review.
The default sandbox is workspace-write, not read-only. This is a deliberate
trade-off: the reviewer needs to run tests, builds, project CLIs, MCP doc
lookups, and web searches to produce findings worth more than a same-model
self-check. Read-only blocks all of that.
Reviewer-side mutation is governed by three layers:
- Prompt-level contract — every review prompt includes a
<reviewer_permissions>block that constrains the reviewer to read + verify, not edit. The reviewer must report mutation as a limitation instead of performing it. - Tracked-tree status detection — main captures
git status --porcelainbefore and after every Codex dispatch. New modifications, new untracked files, or status transitions (e.g. clean → modified) on tracked files trigger a hard stop before fixes are applied. This catches the common case where the reviewer touches a previously-clean tracked file. - Review-input detection — main captures
sha256sumof/tmp/codex-{body,plan,resume-body}-*before and after every dispatch. Any change is treated identically to tracked-file mutation.
git status --porcelain records status codes, not content hashes. If a
tracked file was already modified (M) pre-dispatch and the reviewer
mutates the same file again, the porcelain line stays M file —
the status snapshot diff is empty, and main does not hard-stop.
This is an accepted residual safety gap. Workspace writes are
permitted by the workspace-write sandbox and do NOT generate an
approval request; the approval_policy=on-request + auto_review
machinery only reviews approval requests, so it never sees these
writes. The single protection layer at the skill level is the
<reviewer_permissions> prompt contract telling the reviewer not to
modify project files. The realistic vector is a reviewer-invoked
verification command with an unintended side effect — a formatter,
snapshot updater (pytest --update-snapshots, jest --ci=false),
lockfile regeneration (npm install, cargo update), or code
generator — that legitimately runs but writes to a file the operator
was already editing. Damage is bounded (the mutation remains visible
in git diff after the review concludes), but the skill does not
hard-stop at "apply fixes" time.
Mitigations available to the operator:
- Runtime warning — at the start of every review, if
git status --porcelainshows any already-modified tracked files, the skill emits a warning naming the count and pointing operators to the options below. - Commit (or stash) before review — the simplest defense. A
review of branch commits against
masterwith a clean working tree is fully covered by the porcelain snapshot; any reviewer-side mutation produces a freshMline and trips the hard-stop. sandbox:read-onlyopt-out — when the working file matters more than empirical verification by the reviewer, switch to read-only. Same trade-off as for gitignored state below.
If this risk class hits in practice, the design notes a clear path
to a content-aware snapshot — see docs/DESIGN.md §4.20.
There is one mutation vector that the snapshots do NOT detect:
.gitignored files already present inside REPO_ROOT. Examples include
local SQLite databases (dev.sqlite), .env.local, service-state
directories, and build caches. git status ignores them by definition,
and full-tree snapshotting would be too expensive to run every round.
The realistic exposure is the reviewer running a project test or build
command that side-effects an ignored file — for example pytest
triggering an unintended migration on dev.sqlite because the test
settings point at it. The damage is bounded (the state is re-seedable;
tests generally read .env.local, they do not write to it), but it is
real on default workspace-write reviews.
Three operator-facing mitigations:
-
sandbox:read-onlyopt-out — operators who know they have sensitive ignored state can opt out per-run:/adversarial-review sandbox:read-only
Trade-off: the reviewer loses empirical verification — it cannot run tests, linters, builds, project CLIs, or most MCP-backed verification. You trade reviewer capability for write-protection. Use when the sensitivity of the local state outweighs the value of empirical findings.
-
Worktree isolation — for repositories where the trade-off above is unacceptable, run the review in an isolated git worktree:
git worktree add /tmp/review-worktree HEAD cd /tmp/review-worktree /adversarial-review # ... when done ... cd - git worktree remove /tmp/review-worktree
The worktree shares git history but has its own working tree — the reviewer can run tests freely, and any gitignored state lives in
/tmp/review-worktree, not in your primary checkout. This is the recommended pattern for sensitive repos.Important — committed work only.
git worktree add … HEADcreates a clean worktree at the HEAD commit and does NOT transfer unstaged or staged changes from your primary checkout. If your WIP is uncommitted, the review will only see branch commits (master...HEAD) and silently skip your local edits. Commit (orgit stash) before running the recipe, and double-check withgit status --shortinside the new worktree to confirm what the review will cover. -
Runtime hint — at the start of every review (unless an explicit
sandbox:*override was passed), the skill emits a one-line reminder:ℹ workspace-write in effect; pass sandbox:read-only if sensitive ignored state lives under REPO_ROOT.
This is a deliberate trade-off. Defaulting reviews to read-only or to
an isolated worktree would gut the reviewer's empirical verification
capability — which is precisely what makes adversarial cross-model
review more valuable than a same-model self-check.
With the default workspace-write sandbox, Codex relies on bubblewrap
(bwrap) and unprivileged user namespaces. On Ubuntu 24.04+ this is
restricted by AppArmor and needs explicit setup.
Probe whether bwrap works on your host:
bwrap --dev-bind / / --unshare-net /bin/echo okIf this prints ok, you're done. If it fails (typical errors:
bwrap: setting up uid map: Permission denied, or
bwrap: clone: Operation not permitted), apply the official bwrap
AppArmor profile:
sudo apt install -y apparmor-profiles apparmor-utils
sudo install -m 0644 /usr/share/apparmor/extra-profiles/bwrap-userns-restrict \
/etc/apparmor.d/bwrap-userns-restrict
sudo apparmor_parser -r /etc/apparmor.d/bwrap-userns-restrictThen re-run the probe:
bwrap --dev-bind / / --unshare-net /bin/echo okIf the probe still fails, see the official Codex sandboxing documentation: https://developers.openai.com/codex/concepts/sandboxing.
For installer agents: do NOT change AppArmor policy silently. Probe
first, show the exact apt install / install / apparmor_parser
commands, request explicit operator permission, apply the profile only
after approval, and re-run the probe.
Per-dispatch preflight. The runner subagent runs the same bwrap
probe before every initial / fresh-exec dispatch that selects a
bwrap-backed sandbox mode (read-only or workspace-write). If the
probe fails, the runner returns a success + degraded_environmental
result with a one-line user_warning pointing back to this section,
which the lead surfaces and treats as terminal. The preflight is
skipped under sandbox:inherit (effective sandbox is unknown until
Codex launches) and under sandbox:danger-full-access (no bwrap).
The skill detects the operator's language from recent conversation messages and translates runtime prose accordingly: the runtime hint, the structural-gate prompt, the workspace-mutation diagnostics, the final operator summary, and ad-hoc warnings.
Machine-readable literals stay in English regardless:
- Severity tags:
[severity: critical|high|medium] - Verdict line:
VERDICT: APPROVED/VERDICT: REVISE - Review section headers:
Summary,Findings,Verdict
Repository documentation (this README, SKILL.md, references/runner.md,
docs/DESIGN.md, specs under docs/superpowers/specs/) is always in
English. If you contribute changes, keep documentation paragraphs in a
single language.
Every terminal state — approved, max rounds reached, not verified, or aborted — produces a final operator-facing summary in the operator's language, emitted AFTER the canonical per-state header block and AFTER the final verbatim reviewer response (where one exists).
The summary covers:
- Final status and what it means for what the operator should do next.
- What changed across all rounds (compact, no full diffs).
- Findings applied / re-scoped / rejected, with focus on rejections and structural changes.
- Whether structural fixes received operator sign-off (or were applied in autonomous / headless mode).
- Verification performed by the reviewer vs. by the lead vs. still unverified.
- Remaining findings or risks (for non-approved terminal states).
The summary is built from per-round decision accounts already shown earlier in the conversation. Main never reads Codex stdout, stderr, or rollout files to assemble it. If context compaction has obscured part of the history, the summary states that explicitly rather than fabricating details.
codex exec exits with model error.
Some models are unavailable with ChatGPT accounts (e.g. o3-mini).
The default gpt-5.5 works with both ChatGPT and API key auth.
Override with /adversarial-review model:<name>.
codex exec rejects -c approval_policy=... or -c approvals_reviewer=....
You are on a Codex CLI older than 0.132. Upgrade
(npm install -g @openai/codex@latest). The skill emits the -c form
unconditionally because the top-level -a / --ask-for-approval flag was
removed in 0.132.
Review aborts with "Reviewer or runner mutated tracked files during dispatch".
The pre/post git status --porcelain snapshot detected changes to tracked
files between dispatch start and runner return. This is a HARD STOP — the
artifact under review may have been silently mutated. Inspect the diff
shown in the diagnostic, decide whether to keep or revert manually, and
re-invoke /adversarial-review when the working tree is in the state you
expect. If you suspect a specific verification command in the reviewer's
toolchain is responsible (test runner doing a migration, build script
regenerating a file), pass sandbox:read-only next time.
Review aborts with "ABORTED — environmental failure before any valid review".
The initial Codex dispatch returned a degraded_environmental review (the
reviewer self-reported it could not run because of sandbox or environment
failure). Common causes on Linux: bwrap not installed, AppArmor
restricting unprivileged user namespaces, or a rate-limit / trust-prompt
stub from Codex. Run the Linux sandbox prerequisites
probe and apply the AppArmor profile if needed.
If the failure is the bwrap preflight specifically, sandbox:read-only does
NOT bypass it — read-only is bwrap-backed and runs the same probe. The
actual bypass options are sandbox:danger-full-access (no bwrap, only use
in trusted local debugging), sandbox:inherit (trust your local Codex
config), or fixing bwrap/AppArmor per the prerequisites section.
degraded_environmental on resume.
The resume produced a non-actionable review (typically a sandbox or
rate-limit issue mid-loop). The skill does NOT count it as a round and
routes through the Step 7.4 fallback chain using the prior round's
severity. If the fallback's fresh-exec also returns
degraded_environmental, the review terminates as NOT VERIFIED.
Permission prompts on every action.
Add the permissions from the setup section. Check that
the file is valid JSON and in the right location (project .claude/settings.local.json
or global ~/.claude/settings.json).
Codex hangs / timeout (exit code 124).
All codex exec calls are wrapped in timeout 600 (10 minutes). If you see
exit code 124, the reviewer did not respond in time. Retry — this is usually
transient.
Resume fails with session error.
The skill uses codex exec resume <session-id> for rounds 2+. On failure
(non-zero exit, thread/resume failed in stderr, or a malformed review),
the skill does NOT silently fall back. In an interactive session it asks
whether to run a fresh codex exec (higher token cost) or conclude the
review as NOT VERIFIED. In headless runs it decides based on the maximum
severity of the last successful round's findings: critical/high → fresh
exec; medium-only → conclude as NOT VERIFIED.
--json stdout is empty in my Claude Code session (session ID capture noise).
In some Claude Code sandbox configurations codex's --json event stream is
suppressed when stdout is redirected to a file — the /tmp/codex-stdout-*.jsonl
ends up 0 bytes even though the review itself (-o /tmp/codex-review-*.md)
completes correctly. The skill handles this automatically via a filesystem
fallback: every prompt includes a per-launch session marker
(<!-- ADVERSARIAL-REVIEW-SESSION: <REVIEW_ID>-<ATTEMPT_ID> -->) where
ATTEMPT_ID is a fresh random integer regenerated for the initial exec,
every retry, every resume, and every fresh-exec fallback. The marker is
written to the rollout JSONL on disk. When the JSONL stream is empty, the
skill runs find ~/.codex/sessions -name 'rollout-*.jsonl' -newer <prompt-file> -exec grep -l <REVIEW_ID>-<ATTEMPT_ID> {} + to positively
identify this specific launch's rollout (not by newest-mtime, which would
be unsafe against parallel codex; not by review-stable ID alone, which
would match stale retry rollouts) and extracts the UUID from the filename.
Zero or multiple matches → the skill fails closed with a diagnostic rather
than silently picking. Resume continues to work normally. All commands are
POSIX (find -newer, -exec grep -l) and work identically on Linux and
macOS.
"NOT VERIFIED" result. The skill applied fixes but the reviewer did not re-verify them (resume failed or the operator chose to conclude). This is not an approval — manually review the applied fixes before merging.
Running inside a git submodule.
git rev-parse --show-toplevel returns the submodule path, not the parent
repo. The skill warns you and scopes the review to the submodule. If you
meant to review the parent, invoke the skill from the parent working tree.
Bare repository or not inside a work tree. The skill aborts at Step 2 with a clear message. Run it from inside a git working tree.
Plan Mode exits when writing temp files.
In Claude Code Plan Mode, writing to /tmp may trigger a permission prompt
or exit Plan Mode. This is a known Claude Code limitation. It does not affect
review correctness.
- Plan Mode and
/tmpwrites. Writing review prompts to/tmpmay trigger a permission prompt or cause Plan Mode to exit. Does not affect review correctness. resumeinherits sandbox.codex exec resumedoes not accept-s,-m, or approval-related-coverrides — sandbox and approval mode are properties of the original session. Changingsandbox:orapprovals:mid-review requires aborting and re-invoking/adversarial-reviewwith the new override, which starts a fresh review.resumehas no-Cflag. The skill capturesREPO_ROOTviagit rev-parse --show-toplevelat Step 2 and prefixes every resume withcd '<REPO_ROOT>' && .... This requires paths without single quotes; pathological paths (containing',",$, backtick, newline) cause the skill to abort at Step 2.- Submodule scoping. When invoked inside a submodule, the review is
scoped to the submodule —
git rev-parse --show-topleveldoes not walk up to the parent. A warning is printed; invoke from the parent repo if you want parent scope. - macOS end-to-end not tested. The secondary session-id capture
uses only POSIX flags (
find -newer FILE,-exec CMD {} +,grep -l), so it should work identically on macOS as on Linux, but the skill has not been end-to-end tested on macOS.
- Gemini as alternative reviewer backend
- Local model support (Ollama, llama.cpp)
- CI integration (GitHub Actions)
- Multi-reviewer mode (parallel review by multiple models)
Adversarial prompt structure developed after studying openai/codex-plugin-cc (Apache-2.0).
Borrowed ideas: XML-structured prompts, adversarial stance, attack surface checklist, finding bar, calibration rules.
What we did differently:
- Iterative loop — Claude fixes issues and resubmits (not "stop and ask user")
- Plan review — reviews plans before code, not just code
- Minimal install —
SKILL.md+ onereferences/runner.md, no server, no broker, vs 15+ JS modules - Verbatim output — reviewer findings shown as-is, not rephrased
Apache-2.0 — see LICENSE.