Skip to content

Latest commit

 

History

History
486 lines (310 loc) · 11.3 KB

File metadata and controls

486 lines (310 loc) · 11.3 KB

CodeCome Workflow

CodeCome uses a phased workflow.

The workflow is intentionally simple in the initial PoC:

  • one phase at a time,
  • one agent at a time,
  • one validation worker at a time,
  • Markdown-only findings,
  • file-based evidence,
  • Docker-based sandbox.

Phase overview

Phase 1: Target reconnaissance
Phase 2: Vulnerability hypothesis generation
Phase 3: Counter-analysis and deduplication
Phase 4: Finding validation
Phase 5: Exploit development
Phase 6: Reporting

Phase 1: Target reconnaissance + sandbox bootstrap

Phase 1 has two sub-stages, run in the same invocation:

  • Phase 1a — Source reconnaissance.
  • Phase 1b — Sandbox bootstrap.

Goal:

Understand the target, then prepare a working validation
environment under `sandbox/`.

Run:

make phase-1

(Or, manually: opencode run --agent recon "$(cat prompts/phase-1-recon.md)".)

Phase 1a: source reconnaissance

Expected outputs under itemdb/notes/:

target-profile.md
attack-surface.md
build-model.md
execution-model.md
trust-boundaries.md
data-flow.md
validation-model.md
interesting-files.md
security-assumptions.md

Optional outputs:

auth-model.md
web-routes.md
cli-commands.md
public-api.md
cwe-map.md
benchmark-notes.md
crypto-usage.md
iac-resources.md

Phase 1a should not normally create findings.

Phase 1b: sandbox bootstrap

Curated examples live under templates/sandboxes/<id>/. The recon agent picks one (or multi-service-compose for multi-stack repos) and applies it via tools/sandbox-bootstrap.py.

Required output:

itemdb/notes/sandbox-plan.md

Generated artifacts (git-ignored):

sandbox/Dockerfile
sandbox/docker-compose.yml
sandbox/scripts/*.sh
sandbox/CODECOME-GENERATED.md
sandbox/.backup-<UTC-timestamp>/  (when previous content was replaced)

sandbox/ itself is tracked only via sandbox/.gitkeep. Everything else inside sandbox/ is regenerated by Phase 1b.

Validation tiers (T1 sandbox setup, T2 sandbox start, T3 sandbox sanity, T4 target build, T5 target test, T6 sandbox stop) are recorded in sandbox/CODECOME-GENERATED.md and summarized in sandbox-plan.md. Phase 2 enforces the gate:

Sandbox state Phase 2
missing block
validation failed block
validation passed allow
user-managed allow

Override the gate (rare): CODECOME_ALLOW_NO_SANDBOX=1 make phase-2.

See docs/sandbox.md for the full bootstrap CLI reference.

Phase 2: Vulnerability hypothesis generation

Goal:

Create precise candidate findings.

Run:

opencode run --agent auditor "$(cat prompts/phase-2-audit.md)"

Expected outputs:

itemdb/findings/PENDING/CC-XXXX-short-title.md

Each finding must include:

  • affected code,
  • source-to-sink or equivalent reasoning,
  • attackability,
  • impact,
  • validation plan,
  • counter-analysis placeholder,
  • evidence placeholder.

All new findings must have:

status: "PENDING"

New findings must not have:

confidence: "CONFIRMED"

Phase 3: Counter-analysis and deduplication

Goal:

Reduce false positives before validation.

Run:

opencode run --agent reviewer "$(cat prompts/phase-3-review.md)"

Expected actions:

  • update # Counter-analysis,
  • improve validation plans,
  • lower or raise confidence,
  • move disproven findings to REJECTED,
  • move duplicate findings to DUPLICATE,
  • leave plausible findings in PENDING.

Phase 3 should not normally mark findings as CONFIRMED.

Phase 4: Finding validation

Goal:

Prove or disprove one finding at a time.

Run:

opencode run "$(sed 's#FINDING_PATH_OR_ID#CC-0001#g' prompts/phase-4-validate.md)"

Alternative:

sed 's#FINDING_PATH_OR_ID#CC-0001#g' prompts/phase-4-validate.md | opencode run

Expected outputs:

itemdb/evidence/CC-0001/
itemdb/evidence/CC-0001/README.md

Useful evidence files:

commands.txt
output.txt
logs.txt
sanitizer.log
crash.txt
request.http
response.txt
exploit.py
payload.bin
test-output.txt
debugger-notes.md
static-proof.md
limitations.md

Possible outcomes:

  • move finding to CONFIRMED,
  • move finding to REJECTED,
  • keep finding in PENDING with unresolved validation notes.

Phase 5: Exploit development

Goal:

Demonstrate real-world impact of confirmed vulnerabilities.

Run:

make phase-5 FINDING=CC-0001

Or manually:

opencode run --agent exploiter "$(sed 's#FINDING_PATH_OR_ID#CC-0001#g' prompts/phase-5-exploit.md)"

Expected outputs:

itemdb/evidence/CC-0001/exploits/
itemdb/evidence/CC-0001/exploits/README.md
itemdb/evidence/CC-0001/exploits/exploit.py
itemdb/evidence/CC-0001/exploits/recordings/        # demo recording

Useful exploitation artifacts:

exploit.py
exploit.sh
payload.bin
malicious-input.txt
captured-output.txt
impact-log.txt

When the PoC works, the exploiter also produces a reproducible demonstration recording under exploits/recordings/ (cast, gif, optional mp4, reproduce.sh, env.txt, README.md). See .opencode/skills/exploit-recording/SKILL.md. Recording effort is mandatory; documented absence does not block EXPLOITED.

EXPLOITED findings must additionally carry:

  • one or more CWE ids in frontmatter,
  • populated # Root cause analysis, # Data flow (or Not applicable.), # Inputs and preconditions, and # Recording sections,
  • a # Remediation idea containing a corrected-code excerpt or unified diff.

Possible outcomes:

  • move finding to EXPLOITED (with demonstrated impact),
  • keep finding in CONFIRMED (exploitation not feasible).

The exploiter may adjust severity based on demonstrated impact.

Phase 6: Reporting

Goal:

Produce Markdown reports.

Run with an agent:

make phase-6

Or manually:

opencode run --agent reporter "$(cat prompts/phase-6-report.md)"

Or generate a basic local report:

make report

Expected output:

itemdb/reports/report.md

Phase 6 reports include CWE and Recording columns in the summary table, a short vulnerable-code excerpt and root-cause summary per CONFIRMED/EXPLOITED finding, and recording references by relative path. Binary blobs (.gif, .mp4) are never embedded inline.

Helper commands

Show available commands:

make help

Validate workspace:

make check

Show finding status:

make status

Get next finding id:

make next-id

Validate finding frontmatter:

make frontmatter

Regenerate index:

make index

Regenerate report:

make report

Check sandbox:

make sandbox-check

Open sandbox shell:

make sandbox-shell

Finding lifecycle

PENDING
    ├── CONFIRMED
    │       └── EXPLOITED
    ├── REJECTED
    └── DUPLICATE

Human review model

Findings are Markdown files so they can be reviewed like code.

Recommended human review points:

  1. After Phase 1, review itemdb/notes/.
  2. After Phase 2, review candidate findings.
  3. After Phase 3, review rejected and duplicate decisions.
  4. After Phase 4, review evidence before trusting confirmed findings.
  5. After Phase 5, review exploit PoCs and severity adjustments.
  6. Before sharing Phase 6 reports, review language and limitations.

Validation worker model

The initial PoC uses one validation worker at a time.

Future versions may allow multiple validation workers, but each worker should have isolated runtime state.

Possible future isolation strategies:

  • one Docker Compose project per finding,
  • one container per finding,
  • one disposable VM per finding,
  • one remote sandbox per finding.

Each validation worker should write only to:

itemdb/evidence/<finding-id>/
runs/

Target-specific behavior

The core workflow is target-agnostic.

Target-specific behavior belongs in:

.opencode/skills/
itemdb/notes/
sandbox/scripts/
codecome.yml target overrides

Examples:

  • C/C++ review should use .opencode/skills/c-cpp-security/.
  • Juliet/SARD review should use .opencode/skills/juliet-benchmark/.
  • Web apps may later use .opencode/skills/web-security/.
  • .NET apps may use .opencode/skills/dotnet-security/.
  • PHP apps may use .opencode/skills/php-security/.
  • SQL-heavy code paths may use .opencode/skills/sql-injection/.

Quality gates

Before moving from Phase 1 to Phase 2:

make check

make check also warns when optional recording tools (asciinema, agg, ffmpeg, Xvfb) are missing; warnings do not fail the gate.

Before reporting:

make frontmatter
make index
make report

Before validation:

make sandbox-check

Recommended first PoC run

  1. Place target source under src/.

  2. Run:

    make venv
    make check
    make sandbox-check
    
  3. Run reconnaissance:

    make phase-1
    

    By default this uses CodeCome's styled wrapper around opencode run --format json.

  4. Review notes under:

    itemdb/notes/
    
  5. Run audit:

    make phase-2
    
  6. Run counter-analysis:

    make phase-3
    
  7. Validate one finding:

    make phase-4 FINDING=CC-0001
    
  8. Develop exploit and demonstration recording for a confirmed finding:

    make phase-5 FINDING=CC-0001
    

    make check warns ahead of time if recording tools are missing.

  9. Regenerate index and report:

    make index
    make report
    

    Or generate a full AI-driven report:

    make phase-6
    

make report remains available as a lightweight local summary, but it is not a full substitute for Phase 6.

All make targets that depend on Python tooling expect a repo-local .venv/. If it is missing or stale, the command will stop and tell you to run make venv.

Wrapper controls:

CODECOME_USE_WRAPPER=0   # bypass wrapper and use raw opencode run
CODECOME_THINKING=1      # show model reasoning/thinking blocks in output
OPENCODE_ARGS='...'      # extra flags for opencode run (forwarded directly when CODECOME_USE_WRAPPER=0; in wrapper mode only --model, --variant and --thinking are used)
CODECOME_MODEL=<id>          # pin the model per phase
CODECOME_MODEL_VARIANT=<v>   # pin the model variant

Model and variant resolution priority (highest wins):

  1. OPENCODE_ARGS='--model … --variant …'
  2. CODECOME_MODEL / CODECOME_MODEL_VARIANT
  3. codecome.yml agents.<name>.model / agents.<name>.variant
  4. The model from your most recent OpenCode session for this project (best-effort, read from OpenCode's local DB).
  5. unknown (banner shows the gap; nothing is appended)

When sources 2 or 3 win, the wrapper appends --model / --variant to the spawned opencode run so the banner is enforced. Source 4 is display-only and never enforced.

To see the full resolution table without launching a phase:

make show-model
make show-model AGENT=auditor

License

CodeCome is dual-licensed under your choice of:

  • GNU General Public License version 3 or later (GPL-3.0-or-later), or
  • GNU Affero General Public License version 3 or later (AGPL-3.0-or-later).

SPDX expression: GPL-3.0-or-later OR AGPL-3.0-or-later.

The files under templates/sandboxes/ are an exception: they are licensed under the MIT License so they can be copied into user workspaces without imposing copyleft obligations on those user projects.

See LICENSE, AGPL-LICENSE, templates/sandboxes/LICENSE, and NOTICE. Contributions are accepted under the terms described in CONTRIBUTING.md.

Copyright (C) 2025-2026 Pablo Ruiz García <pablo.ruiz@gmail.com>.