chore: comprehensive repo cleanup + evidence assessor by Coldaine · Pull Request #384 · Coldaine/ColdVox

Coldaine · 2026-03-31T23:51:28Z

Summary

Two sets of changes on this branch:

1. Agentic Evidence Assessor (shadow mode PR reviewer)

.github/workflows/agentic-evidence-preview.yml\ — shadow reviewer workflow
.github/prompts/evidence-assessor.md\ — Chain-of-Thought prompt
\docs/plans/agentic-evidence-preview.md\ — system spec

2. Comprehensive Repo Cleanup (70 files, -2,946 lines)

Systematic cleanup of dead code, dead documentation, dead references, and stale agent instructions. Three independent subagent audits confirmed every deletion.

Root Cleanup

Deleted \CLAUDE.md\ and \GEMINI.md\ (byte-identical copies of \AGENTS.md)
Deleted root junk: \plugins.json, \pr_365_details.json, \ est_enigo_live.rs\
Archived 6 root reports to \docs/archive/root/\

Dead Backend Code (Rust)

Replaced \WHISPER_MODEL_PATH\ env var with \STT_MODEL_PATH\ in types.rs and tests
Fixed integration tests: \whisper\ -> \moonshine\ as preferred plugin
Updated doc comments in plugin.rs and plugin_types.rs
Deleted \crates/app/plugins.json\ (had \preferred_plugin: whisper)

Dead Reference Fixes (20+ occurrences)

Replaced ALL \windows-multi-agent-recovery.md\ references with \current-status.md\
Removed 'absolute truth' language from agent rules
Fixed \AGENTS.md\ pointer to nonexistent \CI/policy.md\ -> \CI/architecture.md\

Doc Pruning

Deleted 15 empty/expired docs (stubs, raw transcripts, past-retention)
Archived 8 stale docs (Linux-only PipeWire, org-wide observability, superseded)
Fixed contradictions: removed Whisper from stt-overview.md, corrected Moonshine description

Agent Instruction Restructure

.github/copilot-instructions.md\ is now the canonical source
\AGENTS.md\ and .kilocode/rules/agents.md\ sync from it
Updated \�nsure_agent_hardlinks.sh\ and \check_markdown_placement.py\

Dead Vendor/Scripts

Deleted \�endor/vosk/\ stubs
Deleted dead scripts: setup-vosk-cache.sh, verify_vosk_model.sh, ensure_venv.sh, start-headless.sh

Evidence

\cargo check --workspace --all-targets\ passes
Zero remaining references to \windows-multi-agent-recovery.md\ in active files
Zero remaining 'absolute truth' language
All Whisper/dead backend references removed from active code and docs

Implements the Portable Agentic Evidence Standard for ColdVox: - docs/reviews/portable_standard_critique.md: Philosophy document explaining why tautological unit tests are insufficient and the case for empirical evidence-based PR review. - docs/reviews/reviewer_driven_evidence.md: Workflow strategy describing how the reviewer-driven evidence process works, evidence tiers (1-5), and semantic drift detection patterns. - docs/plans/agentic-evidence-preview.md: System architecture spec for the shadow mode assessor: permissions, git diff strategy, token budget, failure modes, and Phase 2 considerations. - .github/prompts/evidence-assessor.md: The hardened CoT prompt that Gemini executes in CI. Includes explicit anti-hallucination constraints, structured output format, and ColdVox-specific ground truths (Moonshine fragile, Parakeet not ready, stubs dead). - .github/workflows/agentic-evidence-preview.yml: GitHub Actions workflow triggering on PR events. Uses fetch-depth: 0 for correct git diff, truncates diffs at 2000 lines, composes the full prompt with pre-gathered context, runs gemini-cli in non-interactive mode, and pipes the report to GITHUB_STEP_SUMMARY. Shadow mode: never blocks merges. GEMINI_API_KEY secret must be configured in repo settings.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 56750304e4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-31T23:55:02Z

+          FULL_PROMPT="${FULL_PROMPT/\{PR_TITLE_PLACEHOLDER\}/${PR_TITLE}}"
+          FULL_PROMPT="${FULL_PROMPT/\{PR_BODY_PLACEHOLDER\}/${PR_BODY}}"
+          FULL_PROMPT="${FULL_PROMPT/\{GIT_DIFF_PLACEHOLDER\}/${GIT_DIFF}}"
+          FULL_PROMPT="${FULL_PROMPT/\{DOCS_INDEX_PLACEHOLDER\}/${DOCS_INDEX}}"
+          FULL_PROMPT="${FULL_PROMPT/\{NORTHSTAR_EXCERPT_PLACEHOLDER\}/${NORTHSTAR}}"


Escape replacements when interpolating PR fields

The prompt composition uses Bash pattern substitution (${FULL_PROMPT/.../${PR_TITLE}} etc.) at lines 80–84, but in Bash with patsub_replacement enabled (default on Bash 5.2), & in replacement text expands to the matched pattern. A common PR title/body like A & B therefore corrupts the composed prompt by re-inserting placeholder text, which can break claim extraction and produce incorrect evidence reports.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-31T23:55:02Z

---
-doc_type: plan
-subsystem: general


Restore removed canonical plan or migrate existing references

This commit deletes docs/plans/windows-multi-agent-recovery.md, but many repository anchors still point to that exact path (for example in README.md, docs/architecture.md, and docs/standards/agent-rules.md). Removing the file without updating those references leaves broken canonical guidance links for contributors and agents, which is a maintainability regression introduced by this change.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

Adds a “shadow mode” CI reviewer that uses Gemini to assess whether PR descriptions’ material claims are backed by evidence in the diff (and to flag potential semantic drift), outputting an advisory report to the GitHub Actions Step Summary.

Changes:

Adds a new GitHub Actions workflow to gather PR context, run @google/gemini-cli, and publish a Step Summary report (non-blocking).
Introduces a hardened prompt template for the evidence assessor and accompanying docs/reviews describing the evidence standard.
Updates planning/docs around the assessor architecture and reviewer-driven evidence workflow (and removes the existing Windows recovery plan file).

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`docs/reviews/reviewer_driven_evidence.md`	Defines the reviewer-driven evidence workflow and evidence tiers.
`docs/reviews/portable_standard_critique.md`	Rationale/critique motivating the evidence standard approach.
`docs/prompts/review-and-implement-evidence-assessor.md`	Meta-prompt documenting how the implementation was reviewed/derived.
`docs/plans/windows-multi-agent-recovery.md`	Removes the prior Windows recovery plan (repo execution anchor).
`docs/plans/agentic-evidence-preview.md`	Specifies the assessor system architecture and intended workflow behavior.
`.github/workflows/agentic-evidence-preview.yml`	Implements the shadow-mode assessor workflow that runs Gemini CLI and writes Step Summary output.
`.github/prompts/evidence-assessor.md`	Prompt template the workflow fills with PR/diff/docs context.

Copilot · 2026-03-31T23:56:20Z

+          FULL_PROMPT="${FULL_PROMPT/\{PR_TITLE_PLACEHOLDER\}/${PR_TITLE}}"
+          FULL_PROMPT="${FULL_PROMPT/\{PR_BODY_PLACEHOLDER\}/${PR_BODY}}"
+          FULL_PROMPT="${FULL_PROMPT/\{GIT_DIFF_PLACEHOLDER\}/${GIT_DIFF}}"
+          FULL_PROMPT="${FULL_PROMPT/\{DOCS_INDEX_PLACEHOLDER\}/${DOCS_INDEX}}"
+          FULL_PROMPT="${FULL_PROMPT/\{NORTHSTAR_EXCERPT_PLACEHOLDER\}/${NORTHSTAR}}"


The placeholder substitutions won’t match the template tokens because the pattern includes literal backslashes (e.g., \{PR_TITLE_PLACEHOLDER\}) while the prompt file contains {PR_TITLE_PLACEHOLDER}. As a result, the composed prompt will still contain placeholders and Gemini won’t receive the PR title/body/diff/docs context. Fix by matching the exact token strings (remove the backslashes) or switch to a more robust templating approach (e.g., a small Python script that reads the template and replaces tokens).

Suggested change

FULL_PROMPT="${FULL_PROMPT/\{PR_TITLE_PLACEHOLDER\}/${PR_TITLE}}"

FULL_PROMPT="${FULL_PROMPT/\{PR_BODY_PLACEHOLDER\}/${PR_BODY}}"

FULL_PROMPT="${FULL_PROMPT/\{GIT_DIFF_PLACEHOLDER\}/${GIT_DIFF}}"

FULL_PROMPT="${FULL_PROMPT/\{DOCS_INDEX_PLACEHOLDER\}/${DOCS_INDEX}}"

FULL_PROMPT="${FULL_PROMPT/\{NORTHSTAR_EXCERPT_PLACEHOLDER\}/${NORTHSTAR}}"

FULL_PROMPT="${FULL_PROMPT/{PR_TITLE_PLACEHOLDER}/${PR_TITLE}}"

FULL_PROMPT="${FULL_PROMPT/{PR_BODY_PLACEHOLDER}/${PR_BODY}}"

FULL_PROMPT="${FULL_PROMPT/{GIT_DIFF_PLACEHOLDER}/${GIT_DIFF}}"

FULL_PROMPT="${FULL_PROMPT/{DOCS_INDEX_PLACEHOLDER}/${DOCS_INDEX}}"

FULL_PROMPT="${FULL_PROMPT/{NORTHSTAR_EXCERPT_PLACEHOLDER}/${NORTHSTAR}}"

Copilot · 2026-03-31T23:56:20Z

+on:
+  pull_request:
+    types: [opened, synchronize, ready_for_review]
+    branches:
+      - main
+


This workflow is restricted to PRs targeting main via on.pull_request.branches. The spec/docs in this PR describe running on “every PR” and also call out supporting PRs targeting release branches (by using github.event.pull_request.base.ref). If you want the assessor to run for non-main base branches, remove the branches: [main] filter (or update the spec/docs to match the intended scope).

Copilot · 2026-03-31T23:56:20Z

+          # --model: gemini-2.5-flash balances quality and cost for this use case.
+          # Input is piped from the composed prompt file; stdout is the report.
+          npx --yes @google/gemini-cli \
+            --model gemini-2.0-flash \


The inline comment and the plan/spec say the workflow uses gemini-2.5-flash, but the actual CLI invocation uses --model gemini-2.0-flash. Please align the implementation and documentation (either update the workflow to the intended model, or update the docs/comment so reviewers know what is actually running).

Suggested change

--model gemini-2.0-flash \

--model gemini-2.5-flash \

Copilot · 2026-03-31T23:56:21Z

+│  Step 1: actions/checkout (fetch-depth: 0)                      │
+│  Step 2: Fetch base branch ref explicitly                       │
+│  Step 3: Generate git diff (base...HEAD), truncate at 2000 ln  │
+│  Step 4: Collect docs index (ls docs/ recursive, head 100ln)   │
+│  Step 5: Extract anchor docs (northstar.md, AGENTS.md)          │
+│  Step 6: Compose full prompt (instructions + all context)       │
+│  Step 7: npx @google/gemini-cli --model gemini-2.5-flash       │
+│  Step 8: Write report → $GITHUB_STEP_SUMMARY                   │
+└─────────────────────────────────────────────────────────────────┘
+```


The spec claims the runner extracts anchor docs including AGENTS.md (and the token budget table includes an AGENTS excerpt), but the workflow currently only embeds docs/northstar.md plus a docs file list. This reduces the assessor’s ability to apply the repo’s stated “ground truth” rules. Consider embedding AGENTS.md (and, if intended, docs/plans/current-status.md) excerpts into the composed prompt and adding corresponding placeholders to the prompt template.

Copilot · 2026-03-31T23:56:21Z

+## Overview
+
+This document specifies the implementation of a "Shadow Mode Agentic Evidence Assessor" for ColdVox. A Gemini-powered CI agent runs on every PR to audit whether the PR provides empirical evidence for its material claims and detects semantic drift between code and documentation.
+
+**Shadow Mode**: The assessor is non-blocking in Phase 1. It writes a Markdown report to the GitHub Step Summary. It does not comment on the PR, does not set a check status, and cannot block merges. It is advisory only.
+


The PR description says this change “adds 6 new files” and implies no existing content changes, but the diff also deletes docs/plans/windows-multi-agent-recovery.md (the repo’s current execution plan referenced by README/docs/architecture/agent instructions). If this deletion is intentional, you’ll need to update the many in-repo references and regenerate/commit docs/index.md (docs-ci enforces scripts/build_docs_index.py). If it’s not intentional, restore the plan file.

…nces Root cleanup: - Delete CLAUDE.md, GEMINI.md (byte-identical copies of AGENTS.md) - Delete root junk: plugins.json, pr_365_details.json, test_enigo_live.rs - Archive 6 root reports to docs/archive/root/ Dead backend code: - Replace WHISPER_MODEL_PATH with STT_MODEL_PATH in types.rs and tests - Fix integration tests: whisper -> moonshine as preferred plugin - Update doc comments in plugin.rs, plugin_types.rs - Delete crates/app/plugins.json (had preferred_plugin: whisper) - Remove stale faster-whisper comment from Cargo.toml Dead reference fixes: - Replace ALL 20+ windows-multi-agent-recovery.md refs with current-status.md - Remove 'absolute truth' language from agent rules - Fix AGENTS.md pointer to nonexistent CI/policy.md -> CI/architecture.md - Fix README.md: remove CLAUDE.md reference - Update drive-project.prompt.md, gui-design-overview, todo.md Doc pruning: - Delete 15 empty/expired docs (stubs, chat transcripts, past-retention) - Archive 8 stale docs (Linux-only, org-wide, superseded) - Fix stt-overview.md: remove Whisper from Supported Backends - Fix aud-user-config-design.md: Moonshine is PyO3 not pure Rust - Fix fdn-testing-guide.md: add Parakeet validation warning Agent instruction restructure: - Sync AGENTS.md from .github/copilot-instructions.md (full content) - Update ensure_agent_hardlinks.sh: source is now copilot-instructions.md - Update check_markdown_placement.py: CLAUDE.md -> AGENTS.md - Update standards.md: remove CLAUDE.md/GEMINI.md references Dead vendor/scripts: - Delete vendor/vosk/ (stubs to dead Linux runner cache) - Delete scripts: setup-vosk-cache.sh, verify_vosk_model.sh, ensure_venv.sh, start-headless.sh

- SttRemoteAuthSettings: use #[derive(Default)] instead of manual impl (clippy::derivable_impls error in CI) - deny.toml: add RUSTSEC ignores for unmaintained transitive deps from Tauri (gtk3-rs, fxhash, unic-*, proc-macro-error - all from wry/tauri GUI layer, no safe upgrade available, no security impact) - docs/index.md: regenerate after doc cleanup changed file count/structure

Previous implementation failed because: 1. The 'Gather PR context' step failed with bash string substitution bugs 2. \{PLACEHOLDER\} patterns in bash expansion don't match {PLACEHOLDER} tokens 3. Large PRs caused the diff to be unavailable or truncated incorrectly New approach: - Use gemini-cli --approval-mode=yolo to give the agent autonomous tools - Agent reads its instructions from the prompt file directly - Agent runs git diff, reads files, and explores the repo itself - No more brittle bash string replacement for prompt composition - Combines two steps into one to avoid compose/run split failures - Still uses gemini-2.0-flash (fixes model name/docs mismatch) - Agent writes report to /tmp/report.md which is always checked Addresses Copilot reviewer comments on bash substitution bugs and model name mismatch between workflow comments and actual --model flag.

Delete .github/agents/ (project-driver, researcher, implementer, tester) and .github/prompts/drive-project.prompt.md — these were prompt-only specs with no automation hooks. They added complexity without value. Add docs/visuals/ with two interactive HTML dashboards: - agentic-workflow-dashboard.html: provenance, wiring, prompt anatomy - ci-reviewer-dashboard.html: activation cadence, prompt, implications

…rthstar (complex only) - Add complexity scorer (pure bash, no API) that counts Rust file changes - complex: >10 crates/ files → triggers Northstar Reviewer - moderate: 1-10 crates/ OR any workflow change - simple: docs/config only - Evidence Assessor now always runs using gemini-2.5-flash (was gemini-2.0-flash which is invalid) - Northstar Reviewer added, runs only on complex PRs, uses gemini-2.5-pro - Fix file access: instructions written into workspace dir (not /tmp/) so Gemini CLI can read them - Add northstar-alignment-reviewer.md prompt - Add _ci_*.md and _tmp_*.md patterns to .gitignore (agent working files)

…ort paths - Remove _ci_evidence_*.md and _ci_northstar_*.md from .gitignore Gemini CLI uses .gitignore as a security boundary and refuses to read or write any gitignored file. CI temp files must be ungitignored. - Fix evidence-assessor.md: write to _ci_evidence_report.md (not /tmp/report.md) - Fix northstar-alignment-reviewer.md: write to _ci_northstar_report.md The _tmp_*.md wildcard pattern is retained for other temp files.

Switch models to validated stable versions: - Evidence Assessor: gemini-3-flash-preview -> gemini-2.5-flash - Northstar Reviewer: gemini-3.1-pro-preview-customtools -> gemini-2.5-pro Add settings.json auth step before each gemini-cli invocation to prevent OAuth browser prompt in headless CI runners. GEMINI_API_KEY env var alone is not sufficient to skip interactive auth selection. Update comments and Step Summary labels to match actual models.

…i CLI" This reverts commit be490c4.

Coldaine · 2026-04-16T10:43:14Z

Closing: all changes from this PR are contained in #397, which will land the cleanup + evidence assessor together with the Always-On Push-to-Transcribe feature.

Closes #384 (contained) Closes #397

- Adds ActivationMode::AlwaysOnPushToTranscribe with ~2s rolling audio buffer - Prevents hotkey start mechanical clipping of transcription - Includes agentic evidence assessor CI infrastructure - Comprehensive repo cleanup: dead docs, dead code, dead references Closes #384 (contained)

Coldaine added 2 commits March 31, 2026 14:49

docs: remove windows multi-agent recovery plan

94f94aa

Copilot AI review requested due to automatic review settings March 31, 2026 23:51

Copilot started reviewing on behalf of Coldaine March 31, 2026 23:52 View session

chatgpt-codex-connector bot reviewed Mar 31, 2026

View reviewed changes

Copilot AI reviewed Mar 31, 2026

View reviewed changes

Coldaine changed the title ~~feat(ci): Agentic Evidence Assessor — shadow mode PR reviewer (Phase 1)~~ chore: comprehensive repo cleanup + evidence assessor Apr 1, 2026

qodo-free-for-open-source-projects bot deleted a comment from qodo-code-review bot Apr 1, 2026

Coldaine added 2 commits April 1, 2026 12:07

qodo-free-for-open-source-projects bot deleted a comment from qodo-code-review bot Apr 1, 2026

qodo-free-for-open-source-projects bot deleted a comment from qodo-code-review bot Apr 4, 2026

Coldaine added 5 commits April 4, 2026 05:23

Revert "fix(ci): use stable models, add headless auth setup for Gemin…

c16e3f1

…i CLI" This reverts commit be490c4.

fix(ci): restore validated Gemini preview models

8a7d4ad

Coldaine closed this Apr 16, 2026

Coldaine pushed a commit that referenced this pull request Apr 16, 2026

feat(stt): Always-On Push To Transcribe Mode

ae91fc8

Closes #384 (contained) Closes #397

Coldaine mentioned this pull request Apr 16, 2026

Merge tauri-base into main #401

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: comprehensive repo cleanup + evidence assessor#384

chore: comprehensive repo cleanup + evidence assessor#384
Coldaine wants to merge 11 commits intomainfrom
chore/docs-and-code-cleanup

Coldaine commented Mar 31, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 31, 2026

Uh oh!

chatgpt-codex-connector bot Mar 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Coldaine commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Coldaine commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Agentic Evidence Assessor (shadow mode PR reviewer)

2. Comprehensive Repo Cleanup (70 files, -2,946 lines)

Root Cleanup

Dead Backend Code (Rust)

Dead Reference Fixes (20+ occurrences)

Doc Pruning

Agent Instruction Restructure

Dead Vendor/Scripts

Evidence

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Coldaine commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Coldaine commented Mar 31, 2026 •

edited

Loading