Skip to content

chore: comprehensive repo cleanup + evidence assessor#384

Closed
Coldaine wants to merge 11 commits intomainfrom
chore/docs-and-code-cleanup
Closed

chore: comprehensive repo cleanup + evidence assessor#384
Coldaine wants to merge 11 commits intomainfrom
chore/docs-and-code-cleanup

Conversation

@Coldaine
Copy link
Copy Markdown
Owner

@Coldaine Coldaine commented Mar 31, 2026

Summary

Two sets of changes on this branch:

1. Agentic Evidence Assessor (shadow mode PR reviewer)

  • .github/workflows/agentic-evidence-preview.yml\ — shadow reviewer workflow
  • .github/prompts/evidence-assessor.md\ — Chain-of-Thought prompt
  • \docs/plans/agentic-evidence-preview.md\ — system spec

2. Comprehensive Repo Cleanup (70 files, -2,946 lines)

Systematic cleanup of dead code, dead documentation, dead references, and stale agent instructions. Three independent subagent audits confirmed every deletion.

Root Cleanup

  • Deleted \CLAUDE.md\ and \GEMINI.md\ (byte-identical copies of \AGENTS.md)
  • Deleted root junk: \plugins.json, \pr_365_details.json, \ est_enigo_live.rs\
  • Archived 6 root reports to \docs/archive/root/\

Dead Backend Code (Rust)

  • Replaced \WHISPER_MODEL_PATH\ env var with \STT_MODEL_PATH\ in types.rs and tests
  • Fixed integration tests: \whisper\ -> \moonshine\ as preferred plugin
  • Updated doc comments in plugin.rs and plugin_types.rs
  • Deleted \crates/app/plugins.json\ (had \preferred_plugin: whisper)

Dead Reference Fixes (20+ occurrences)

  • Replaced ALL \windows-multi-agent-recovery.md\ references with \current-status.md\
  • Removed 'absolute truth' language from agent rules
  • Fixed \AGENTS.md\ pointer to nonexistent \CI/policy.md\ -> \CI/architecture.md\

Doc Pruning

  • Deleted 15 empty/expired docs (stubs, raw transcripts, past-retention)
  • Archived 8 stale docs (Linux-only PipeWire, org-wide observability, superseded)
  • Fixed contradictions: removed Whisper from stt-overview.md, corrected Moonshine description

Agent Instruction Restructure

  • .github/copilot-instructions.md\ is now the canonical source
  • \AGENTS.md\ and .kilocode/rules/agents.md\ sync from it
  • Updated \�nsure_agent_hardlinks.sh\ and \check_markdown_placement.py\

Dead Vendor/Scripts

  • Deleted \�endor/vosk/\ stubs
  • Deleted dead scripts: setup-vosk-cache.sh, verify_vosk_model.sh, ensure_venv.sh, start-headless.sh

Evidence

  • \cargo check --workspace --all-targets\ passes
  • Zero remaining references to \windows-multi-agent-recovery.md\ in active files
  • Zero remaining 'absolute truth' language
  • All Whisper/dead backend references removed from active code and docs

Implements the Portable Agentic Evidence Standard for ColdVox:

- docs/reviews/portable_standard_critique.md: Philosophy document
  explaining why tautological unit tests are insufficient and the
  case for empirical evidence-based PR review.

- docs/reviews/reviewer_driven_evidence.md: Workflow strategy
  describing how the reviewer-driven evidence process works,
  evidence tiers (1-5), and semantic drift detection patterns.

- docs/plans/agentic-evidence-preview.md: System architecture spec
  for the shadow mode assessor: permissions, git diff strategy,
  token budget, failure modes, and Phase 2 considerations.

- .github/prompts/evidence-assessor.md: The hardened CoT prompt
  that Gemini executes in CI. Includes explicit anti-hallucination
  constraints, structured output format, and ColdVox-specific
  ground truths (Moonshine fragile, Parakeet not ready, stubs dead).

- .github/workflows/agentic-evidence-preview.yml: GitHub Actions
  workflow triggering on PR events. Uses fetch-depth: 0 for correct
  git diff, truncates diffs at 2000 lines, composes the full prompt
  with pre-gathered context, runs gemini-cli in non-interactive mode,
  and pipes the report to GITHUB_STEP_SUMMARY. Shadow mode: never
  blocks merges.

GEMINI_API_KEY secret must be configured in repo settings.
Copilot AI review requested due to automatic review settings March 31, 2026 23:51
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 56750304e4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +80 to +84
FULL_PROMPT="${FULL_PROMPT/\{PR_TITLE_PLACEHOLDER\}/${PR_TITLE}}"
FULL_PROMPT="${FULL_PROMPT/\{PR_BODY_PLACEHOLDER\}/${PR_BODY}}"
FULL_PROMPT="${FULL_PROMPT/\{GIT_DIFF_PLACEHOLDER\}/${GIT_DIFF}}"
FULL_PROMPT="${FULL_PROMPT/\{DOCS_INDEX_PLACEHOLDER\}/${DOCS_INDEX}}"
FULL_PROMPT="${FULL_PROMPT/\{NORTHSTAR_EXCERPT_PLACEHOLDER\}/${NORTHSTAR}}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Escape replacements when interpolating PR fields

The prompt composition uses Bash pattern substitution (${FULL_PROMPT/.../${PR_TITLE}} etc.) at lines 80–84, but in Bash with patsub_replacement enabled (default on Bash 5.2), & in replacement text expands to the matched pattern. A common PR title/body like A & B therefore corrupts the composed prompt by re-inserting placeholder text, which can break claim extraction and produce incorrect evidence reports.

Useful? React with 👍 / 👎.

Comment on lines -1 to -3
---
doc_type: plan
subsystem: general
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restore removed canonical plan or migrate existing references

This commit deletes docs/plans/windows-multi-agent-recovery.md, but many repository anchors still point to that exact path (for example in README.md, docs/architecture.md, and docs/standards/agent-rules.md). Removing the file without updating those references leaves broken canonical guidance links for contributors and agents, which is a maintainability regression introduced by this change.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a “shadow mode” CI reviewer that uses Gemini to assess whether PR descriptions’ material claims are backed by evidence in the diff (and to flag potential semantic drift), outputting an advisory report to the GitHub Actions Step Summary.

Changes:

  • Adds a new GitHub Actions workflow to gather PR context, run @google/gemini-cli, and publish a Step Summary report (non-blocking).
  • Introduces a hardened prompt template for the evidence assessor and accompanying docs/reviews describing the evidence standard.
  • Updates planning/docs around the assessor architecture and reviewer-driven evidence workflow (and removes the existing Windows recovery plan file).

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
docs/reviews/reviewer_driven_evidence.md Defines the reviewer-driven evidence workflow and evidence tiers.
docs/reviews/portable_standard_critique.md Rationale/critique motivating the evidence standard approach.
docs/prompts/review-and-implement-evidence-assessor.md Meta-prompt documenting how the implementation was reviewed/derived.
docs/plans/windows-multi-agent-recovery.md Removes the prior Windows recovery plan (repo execution anchor).
docs/plans/agentic-evidence-preview.md Specifies the assessor system architecture and intended workflow behavior.
.github/workflows/agentic-evidence-preview.yml Implements the shadow-mode assessor workflow that runs Gemini CLI and writes Step Summary output.
.github/prompts/evidence-assessor.md Prompt template the workflow fills with PR/diff/docs context.

Comment on lines +80 to +84
FULL_PROMPT="${FULL_PROMPT/\{PR_TITLE_PLACEHOLDER\}/${PR_TITLE}}"
FULL_PROMPT="${FULL_PROMPT/\{PR_BODY_PLACEHOLDER\}/${PR_BODY}}"
FULL_PROMPT="${FULL_PROMPT/\{GIT_DIFF_PLACEHOLDER\}/${GIT_DIFF}}"
FULL_PROMPT="${FULL_PROMPT/\{DOCS_INDEX_PLACEHOLDER\}/${DOCS_INDEX}}"
FULL_PROMPT="${FULL_PROMPT/\{NORTHSTAR_EXCERPT_PLACEHOLDER\}/${NORTHSTAR}}"
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The placeholder substitutions won’t match the template tokens because the pattern includes literal backslashes (e.g., \{PR_TITLE_PLACEHOLDER\}) while the prompt file contains {PR_TITLE_PLACEHOLDER}. As a result, the composed prompt will still contain placeholders and Gemini won’t receive the PR title/body/diff/docs context. Fix by matching the exact token strings (remove the backslashes) or switch to a more robust templating approach (e.g., a small Python script that reads the template and replaces tokens).

Suggested change
FULL_PROMPT="${FULL_PROMPT/\{PR_TITLE_PLACEHOLDER\}/${PR_TITLE}}"
FULL_PROMPT="${FULL_PROMPT/\{PR_BODY_PLACEHOLDER\}/${PR_BODY}}"
FULL_PROMPT="${FULL_PROMPT/\{GIT_DIFF_PLACEHOLDER\}/${GIT_DIFF}}"
FULL_PROMPT="${FULL_PROMPT/\{DOCS_INDEX_PLACEHOLDER\}/${DOCS_INDEX}}"
FULL_PROMPT="${FULL_PROMPT/\{NORTHSTAR_EXCERPT_PLACEHOLDER\}/${NORTHSTAR}}"
FULL_PROMPT="${FULL_PROMPT/{PR_TITLE_PLACEHOLDER}/${PR_TITLE}}"
FULL_PROMPT="${FULL_PROMPT/{PR_BODY_PLACEHOLDER}/${PR_BODY}}"
FULL_PROMPT="${FULL_PROMPT/{GIT_DIFF_PLACEHOLDER}/${GIT_DIFF}}"
FULL_PROMPT="${FULL_PROMPT/{DOCS_INDEX_PLACEHOLDER}/${DOCS_INDEX}}"
FULL_PROMPT="${FULL_PROMPT/{NORTHSTAR_EXCERPT_PLACEHOLDER}/${NORTHSTAR}}"

Copilot uses AI. Check for mistakes.
Comment on lines +4 to +9
on:
pull_request:
types: [opened, synchronize, ready_for_review]
branches:
- main

Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow is restricted to PRs targeting main via on.pull_request.branches. The spec/docs in this PR describe running on “every PR” and also call out supporting PRs targeting release branches (by using github.event.pull_request.base.ref). If you want the assessor to run for non-main base branches, remove the branches: [main] filter (or update the spec/docs to match the intended scope).

Copilot uses AI. Check for mistakes.
# --model: gemini-2.5-flash balances quality and cost for this use case.
# Input is piped from the composed prompt file; stdout is the report.
npx --yes @google/gemini-cli \
--model gemini-2.0-flash \
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline comment and the plan/spec say the workflow uses gemini-2.5-flash, but the actual CLI invocation uses --model gemini-2.0-flash. Please align the implementation and documentation (either update the workflow to the intended model, or update the docs/comment so reviewers know what is actually running).

Suggested change
--model gemini-2.0-flash \
--model gemini-2.5-flash \

Copilot uses AI. Check for mistakes.
Comment on lines +24 to +33
│ Step 1: actions/checkout (fetch-depth: 0) │
│ Step 2: Fetch base branch ref explicitly │
│ Step 3: Generate git diff (base...HEAD), truncate at 2000 ln │
│ Step 4: Collect docs index (ls docs/ recursive, head 100ln) │
│ Step 5: Extract anchor docs (northstar.md, AGENTS.md) │
│ Step 6: Compose full prompt (instructions + all context) │
│ Step 7: npx @google/gemini-cli --model gemini-2.5-flash │
│ Step 8: Write report → $GITHUB_STEP_SUMMARY │
└─────────────────────────────────────────────────────────────────┘
```
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec claims the runner extracts anchor docs including AGENTS.md (and the token budget table includes an AGENTS excerpt), but the workflow currently only embeds docs/northstar.md plus a docs file list. This reduces the assessor’s ability to apply the repo’s stated “ground truth” rules. Consider embedding AGENTS.md (and, if intended, docs/plans/current-status.md) excerpts into the composed prompt and adding corresponding placeholders to the prompt template.

Copilot uses AI. Check for mistakes.
Comment on lines +10 to +15
## Overview

This document specifies the implementation of a "Shadow Mode Agentic Evidence Assessor" for ColdVox. A Gemini-powered CI agent runs on every PR to audit whether the PR provides empirical evidence for its material claims and detects semantic drift between code and documentation.

**Shadow Mode**: The assessor is non-blocking in Phase 1. It writes a Markdown report to the GitHub Step Summary. It does not comment on the PR, does not set a check status, and cannot block merges. It is advisory only.

Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says this change “adds 6 new files” and implies no existing content changes, but the diff also deletes docs/plans/windows-multi-agent-recovery.md (the repo’s current execution plan referenced by README/docs/architecture/agent instructions). If this deletion is intentional, you’ll need to update the many in-repo references and regenerate/commit docs/index.md (docs-ci enforces scripts/build_docs_index.py). If it’s not intentional, restore the plan file.

Copilot uses AI. Check for mistakes.
…nces

Root cleanup:
- Delete CLAUDE.md, GEMINI.md (byte-identical copies of AGENTS.md)
- Delete root junk: plugins.json, pr_365_details.json, test_enigo_live.rs
- Archive 6 root reports to docs/archive/root/

Dead backend code:
- Replace WHISPER_MODEL_PATH with STT_MODEL_PATH in types.rs and tests
- Fix integration tests: whisper -> moonshine as preferred plugin
- Update doc comments in plugin.rs, plugin_types.rs
- Delete crates/app/plugins.json (had preferred_plugin: whisper)
- Remove stale faster-whisper comment from Cargo.toml

Dead reference fixes:
- Replace ALL 20+ windows-multi-agent-recovery.md refs with current-status.md
- Remove 'absolute truth' language from agent rules
- Fix AGENTS.md pointer to nonexistent CI/policy.md -> CI/architecture.md
- Fix README.md: remove CLAUDE.md reference
- Update drive-project.prompt.md, gui-design-overview, todo.md

Doc pruning:
- Delete 15 empty/expired docs (stubs, chat transcripts, past-retention)
- Archive 8 stale docs (Linux-only, org-wide, superseded)
- Fix stt-overview.md: remove Whisper from Supported Backends
- Fix aud-user-config-design.md: Moonshine is PyO3 not pure Rust
- Fix fdn-testing-guide.md: add Parakeet validation warning

Agent instruction restructure:
- Sync AGENTS.md from .github/copilot-instructions.md (full content)
- Update ensure_agent_hardlinks.sh: source is now copilot-instructions.md
- Update check_markdown_placement.py: CLAUDE.md -> AGENTS.md
- Update standards.md: remove CLAUDE.md/GEMINI.md references

Dead vendor/scripts:
- Delete vendor/vosk/ (stubs to dead Linux runner cache)
- Delete scripts: setup-vosk-cache.sh, verify_vosk_model.sh, ensure_venv.sh, start-headless.sh
@Coldaine Coldaine changed the title feat(ci): Agentic Evidence Assessor — shadow mode PR reviewer (Phase 1) chore: comprehensive repo cleanup + evidence assessor Apr 1, 2026
Coldaine added 2 commits April 1, 2026 12:07
- SttRemoteAuthSettings: use #[derive(Default)] instead of manual impl
  (clippy::derivable_impls error in CI)
- deny.toml: add RUSTSEC ignores for unmaintained transitive deps from Tauri
  (gtk3-rs, fxhash, unic-*, proc-macro-error - all from wry/tauri GUI layer,
  no safe upgrade available, no security impact)
- docs/index.md: regenerate after doc cleanup changed file count/structure
Previous implementation failed because:
1. The 'Gather PR context' step failed with bash string substitution bugs
2. \{PLACEHOLDER\} patterns in bash expansion don't match {PLACEHOLDER} tokens
3. Large PRs caused the diff to be unavailable or truncated incorrectly

New approach:
- Use gemini-cli --approval-mode=yolo to give the agent autonomous tools
- Agent reads its instructions from the prompt file directly
- Agent runs git diff, reads files, and explores the repo itself
- No more brittle bash string replacement for prompt composition
- Combines two steps into one to avoid compose/run split failures
- Still uses gemini-2.0-flash (fixes model name/docs mismatch)
- Agent writes report to /tmp/report.md which is always checked

Addresses Copilot reviewer comments on bash substitution bugs and
model name mismatch between workflow comments and actual --model flag.
Delete .github/agents/ (project-driver, researcher, implementer, tester)
and .github/prompts/drive-project.prompt.md — these were prompt-only
specs with no automation hooks. They added complexity without value.

Add docs/visuals/ with two interactive HTML dashboards:
- agentic-workflow-dashboard.html: provenance, wiring, prompt anatomy
- ci-reviewer-dashboard.html: activation cadence, prompt, implications
Coldaine added 5 commits April 4, 2026 05:23
…rthstar (complex only)

- Add complexity scorer (pure bash, no API) that counts Rust file changes
  - complex: >10 crates/ files → triggers Northstar Reviewer
  - moderate: 1-10 crates/ OR any workflow change
  - simple: docs/config only
- Evidence Assessor now always runs using gemini-2.5-flash (was gemini-2.0-flash which is invalid)
- Northstar Reviewer added, runs only on complex PRs, uses gemini-2.5-pro
- Fix file access: instructions written into workspace dir (not /tmp/) so Gemini CLI can read them
- Add northstar-alignment-reviewer.md prompt
- Add _ci_*.md and _tmp_*.md patterns to .gitignore (agent working files)
…ort paths

- Remove _ci_evidence_*.md and _ci_northstar_*.md from .gitignore
  Gemini CLI uses .gitignore as a security boundary and refuses to
  read or write any gitignored file. CI temp files must be ungitignored.
- Fix evidence-assessor.md: write to _ci_evidence_report.md (not /tmp/report.md)
- Fix northstar-alignment-reviewer.md: write to _ci_northstar_report.md

The _tmp_*.md wildcard pattern is retained for other temp files.
Switch models to validated stable versions:
- Evidence Assessor: gemini-3-flash-preview -> gemini-2.5-flash
- Northstar Reviewer: gemini-3.1-pro-preview-customtools -> gemini-2.5-pro

Add settings.json auth step before each gemini-cli invocation to
prevent OAuth browser prompt in headless CI runners. GEMINI_API_KEY
env var alone is not sufficient to skip interactive auth selection.

Update comments and Step Summary labels to match actual models.
@Coldaine
Copy link
Copy Markdown
Owner Author

Closing: all changes from this PR are contained in #397, which will land the cleanup + evidence assessor together with the Always-On Push-to-Transcribe feature.

@Coldaine Coldaine closed this Apr 16, 2026
Coldaine pushed a commit that referenced this pull request Apr 16, 2026
Coldaine pushed a commit that referenced this pull request Apr 16, 2026
- Adds ActivationMode::AlwaysOnPushToTranscribe with ~2s rolling audio buffer
- Prevents hotkey start mechanical clipping of transcription
- Includes agentic evidence assessor CI infrastructure
- Comprehensive repo cleanup: dead docs, dead code, dead references

Closes #384 (contained)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants