CodeCome uses a phased workflow.
The workflow is intentionally simple in the initial PoC:
- one phase at a time,
- one agent at a time,
- one validation worker at a time,
- Markdown-only findings,
- file-based evidence,
- Docker-based sandbox.
Phase 1: Target reconnaissance
Phase 2: Vulnerability hypothesis generation
Phase 3: Counter-analysis and deduplication
Phase 4: Finding validation
Phase 5: Exploit development
Phase 6: Reporting
Phase 1 has two sub-stages, run in the same invocation:
- Phase 1a — Source reconnaissance.
- Phase 1b — Sandbox bootstrap.
Goal:
Understand the target, then prepare a working validation
environment under `sandbox/`.
Run:
make phase-1
(Or, manually: opencode run --agent recon "$(cat prompts/phase-1-recon.md)".)
Expected outputs under itemdb/notes/:
target-profile.md
attack-surface.md
build-model.md
execution-model.md
trust-boundaries.md
data-flow.md
validation-model.md
interesting-files.md
security-assumptions.md
Optional outputs:
auth-model.md
web-routes.md
cli-commands.md
public-api.md
cwe-map.md
benchmark-notes.md
crypto-usage.md
iac-resources.md
Phase 1a should not normally create findings.
Curated examples live under templates/sandboxes/<id>/. The recon
agent picks one (or multi-service-compose for multi-stack repos)
and applies it via tools/sandbox-bootstrap.py.
Required output:
itemdb/notes/sandbox-plan.md
Generated artifacts (git-ignored):
sandbox/Dockerfile
sandbox/docker-compose.yml
sandbox/scripts/*.sh
sandbox/CODECOME-GENERATED.md
sandbox/.backup-<UTC-timestamp>/ (when previous content was replaced)
sandbox/ itself is tracked only via sandbox/.gitkeep. Everything
else inside sandbox/ is regenerated by Phase 1b.
Validation tiers (T1 sandbox setup, T2 sandbox start, T3 sandbox
sanity, T4 target build, T5 target test, T6 sandbox stop) are recorded in sandbox/CODECOME-GENERATED.md
and summarized in sandbox-plan.md. Phase 2 enforces the gate:
| Sandbox state | Phase 2 |
|---|---|
| missing | block |
| validation failed | block |
| validation passed | allow |
| user-managed | allow |
Override the gate (rare): CODECOME_ALLOW_NO_SANDBOX=1 make phase-2.
See docs/sandbox.md for the full bootstrap CLI reference.
Goal:
Create precise candidate findings.
Run:
opencode run --agent auditor "$(cat prompts/phase-2-audit.md)"
Expected outputs:
itemdb/findings/PENDING/CC-XXXX-short-title.md
Each finding must include:
- affected code,
- source-to-sink or equivalent reasoning,
- attackability,
- impact,
- validation plan,
- counter-analysis placeholder,
- evidence placeholder.
All new findings must have:
status: "PENDING"
New findings must not have:
confidence: "CONFIRMED"
Goal:
Reduce false positives before validation.
Run:
opencode run --agent reviewer "$(cat prompts/phase-3-review.md)"
Expected actions:
- update
# Counter-analysis, - improve validation plans,
- lower or raise confidence,
- move disproven findings to
REJECTED, - move duplicate findings to
DUPLICATE, - leave plausible findings in
PENDING.
Phase 3 should not normally mark findings as CONFIRMED.
Goal:
Prove or disprove one finding at a time.
Run:
opencode run "$(sed 's#FINDING_PATH_OR_ID#CC-0001#g' prompts/phase-4-validate.md)"
Alternative:
sed 's#FINDING_PATH_OR_ID#CC-0001#g' prompts/phase-4-validate.md | opencode run
Expected outputs:
itemdb/evidence/CC-0001/
itemdb/evidence/CC-0001/README.md
Useful evidence files:
commands.txt
output.txt
logs.txt
sanitizer.log
crash.txt
request.http
response.txt
exploit.py
payload.bin
test-output.txt
debugger-notes.md
static-proof.md
limitations.md
Possible outcomes:
- move finding to
CONFIRMED, - move finding to
REJECTED, - keep finding in
PENDINGwith unresolved validation notes.
Goal:
Demonstrate real-world impact of confirmed vulnerabilities.
Run:
make phase-5 FINDING=CC-0001
Or manually:
opencode run --agent exploiter "$(sed 's#FINDING_PATH_OR_ID#CC-0001#g' prompts/phase-5-exploit.md)"
Expected outputs:
itemdb/evidence/CC-0001/exploits/
itemdb/evidence/CC-0001/exploits/README.md
itemdb/evidence/CC-0001/exploits/exploit.py
itemdb/evidence/CC-0001/exploits/recordings/ # demo recording
Useful exploitation artifacts:
exploit.py
exploit.sh
payload.bin
malicious-input.txt
captured-output.txt
impact-log.txt
When the PoC works, the exploiter also produces a reproducible
demonstration recording under exploits/recordings/ (cast, gif,
optional mp4, reproduce.sh, env.txt, README.md). See
.opencode/skills/exploit-recording/SKILL.md. Recording effort is
mandatory; documented absence does not block EXPLOITED.
EXPLOITED findings must additionally carry:
- one or more CWE ids in frontmatter,
- populated
# Root cause analysis,# Data flow(orNot applicable.),# Inputs and preconditions, and# Recordingsections, - a
# Remediation ideacontaining a corrected-code excerpt or unified diff.
Possible outcomes:
- move finding to
EXPLOITED(with demonstrated impact), - keep finding in
CONFIRMED(exploitation not feasible).
The exploiter may adjust severity based on demonstrated impact.
Goal:
Produce Markdown reports.
Run with an agent:
make phase-6
Or manually:
opencode run --agent reporter "$(cat prompts/phase-6-report.md)"
Or generate a basic local report:
make report
Expected output:
itemdb/reports/report.md
Phase 6 reports include CWE and Recording columns in the summary
table, a short vulnerable-code excerpt and root-cause summary per
CONFIRMED/EXPLOITED finding, and recording references by relative path.
Binary blobs (.gif, .mp4) are never embedded inline.
Show available commands:
make help
Validate workspace:
make check
Show finding status:
make status
Get next finding id:
make next-id
Validate finding frontmatter:
make frontmatter
Regenerate index:
make index
Regenerate report:
make report
Check sandbox:
make sandbox-check
Open sandbox shell:
make sandbox-shell
PENDING
├── CONFIRMED
│ └── EXPLOITED
├── REJECTED
└── DUPLICATE
Findings are Markdown files so they can be reviewed like code.
Recommended human review points:
- After Phase 1, review
itemdb/notes/. - After Phase 2, review candidate findings.
- After Phase 3, review rejected and duplicate decisions.
- After Phase 4, review evidence before trusting confirmed findings.
- After Phase 5, review exploit PoCs and severity adjustments.
- Before sharing Phase 6 reports, review language and limitations.
The initial PoC uses one validation worker at a time.
Future versions may allow multiple validation workers, but each worker should have isolated runtime state.
Possible future isolation strategies:
- one Docker Compose project per finding,
- one container per finding,
- one disposable VM per finding,
- one remote sandbox per finding.
Each validation worker should write only to:
itemdb/evidence/<finding-id>/
runs/
The core workflow is target-agnostic.
Target-specific behavior belongs in:
.opencode/skills/
itemdb/notes/
sandbox/scripts/
codecome.yml target overrides
Examples:
- C/C++ review should use
.opencode/skills/c-cpp-security/. - Juliet/SARD review should use
.opencode/skills/juliet-benchmark/. - Web apps may later use
.opencode/skills/web-security/. - .NET apps may use
.opencode/skills/dotnet-security/. - PHP apps may use
.opencode/skills/php-security/. - SQL-heavy code paths may use
.opencode/skills/sql-injection/.
Before moving from Phase 1 to Phase 2:
make check
make check also warns when optional recording tools (asciinema,
agg, ffmpeg, Xvfb) are missing; warnings do not fail the gate.
Before reporting:
make frontmatter
make index
make report
Before validation:
make sandbox-check
-
Place target source under
src/. -
Run:
make venv make check make sandbox-check -
Run reconnaissance:
make phase-1By default this uses CodeCome's styled wrapper around
opencode run --format json. -
Review notes under:
itemdb/notes/ -
Run audit:
make phase-2 -
Run counter-analysis:
make phase-3 -
Validate one finding:
make phase-4 FINDING=CC-0001 -
Develop exploit and demonstration recording for a confirmed finding:
make phase-5 FINDING=CC-0001make checkwarns ahead of time if recording tools are missing. -
Regenerate index and report:
make index make reportOr generate a full AI-driven report:
make phase-6
make report remains available as a lightweight local summary, but it is not a full substitute for Phase 6.
All make targets that depend on Python tooling expect a repo-local .venv/. If it is missing or stale, the command will stop and tell you to run make venv.
Wrapper controls:
CODECOME_USE_WRAPPER=0 # bypass wrapper and use raw opencode run
CODECOME_THINKING=1 # show model reasoning/thinking blocks in output
OPENCODE_ARGS='...' # extra flags for opencode run (forwarded directly when CODECOME_USE_WRAPPER=0; in wrapper mode only --model, --variant and --thinking are used)
CODECOME_MODEL=<id> # pin the model per phase
CODECOME_MODEL_VARIANT=<v> # pin the model variant
Model and variant resolution priority (highest wins):
OPENCODE_ARGS='--model … --variant …'CODECOME_MODEL/CODECOME_MODEL_VARIANTcodecome.ymlagents.<name>.model/agents.<name>.variant- The model from your most recent OpenCode session for this project (best-effort, read from OpenCode's local DB).
- unknown (banner shows the gap; nothing is appended)
When sources 2 or 3 win, the wrapper appends --model / --variant
to the spawned opencode run so the banner is enforced. Source 4
is display-only and never enforced.
To see the full resolution table without launching a phase:
make show-model
make show-model AGENT=auditor
CodeCome is dual-licensed under your choice of:
- GNU General Public License version 3 or later (
GPL-3.0-or-later), or - GNU Affero General Public License version 3 or later (
AGPL-3.0-or-later).
SPDX expression: GPL-3.0-or-later OR AGPL-3.0-or-later.
The files under templates/sandboxes/ are an exception: they are
licensed under the MIT License so they can be copied into user
workspaces without imposing copyleft obligations on those user
projects.
See LICENSE, AGPL-LICENSE, templates/sandboxes/LICENSE, and
NOTICE. Contributions are accepted under the terms described in
CONTRIBUTING.md.
Copyright (C) 2025-2026 Pablo Ruiz García <pablo.ruiz@gmail.com>.