Skip to content

Latest commit

 

History

History
124 lines (98 loc) · 5.36 KB

File metadata and controls

124 lines (98 loc) · 5.36 KB

Project Trueflow: Engineering Plan

1. Overview

**Trueflow** (historically called Anchor / Vibecheck) is a semantic code review system. It decouples review status from git history, relying instead on **Content-Addressed Identity**.

Core Philosophy:

  1. **Identity > History:** We review Content, not Commits.
  2. **Git-Independent:** The system works on any directory of files. Git is just a transport layer.
  3. **Merkle-Based:** Files are trees of “Blocks”. Reviews can attach to any node in the tree (Line, Block, File).

2. Architecture: The Trueflow Model

2.1 The Hierarchy

The system sees a repository as a forest of Merkle Trees.

  1. **File (Root):** Represents the state of a file on disk.
  2. **Blocks (Nodes):** Semantic sections. In Code, these might be Functions or Structs. In Text, Paragraphs.
    • Composition: A File is composed of Blocks. Order matters.
    • Review: If you reorder Blocks, the File hash changes, but the Block hashes remain “Approved”.
  3. **Units (Leaves):** The atomic unit. Usually a **Line** of text.

2.2 Slicing & Hashing

How do we go from Bytes -> Tree?

  • **Strategy A (MVP - Line Merkle):**
    • Leaves = Normalized Lines (whitespace trimmed).
    • Nodes = Balanced Tree of hashes (SHA256).
  • **Strategy B (Smart - AST):**
    • Use Tree-sitter to identify “Block” boundaries (Start/End line).
    • These ranges become the Nodes.
    • Advantage: More semantic. Renaming a variable inside a function only invalidates that function’s Block.

2.3 The Review Graph (Data Store)

We store reviews as edges in a graph pointing to Content Hashes.

struct Review {
    // What is being reviewed?
    target_hash: String, // SHA256 of the Block/Unit/File
    
    // Who reviewed it?
    identity: Identity, // Email + Signature
    
    // What did they say?
    verdict: Verdict, // Approved, Rejected, etc.
}

3. Workflows

3.1 “Cold Start” (Audit Mode)

  • User runs `trueflow scan`.
  • System computes Merkle Trees for all files in the directory.
  • System queries DB: “Which nodes in this tree are NOT in the Approved Set?”
  • Result: A list of unreviewed Blocks (e.g., “Function X is unreviewed”).
  • User reviews them -> Adds entries to DB.
  • Stats: “Coverage = 85% of tree nodes approved.”

3.2 “Diff Mode” (Incremental)

  • User runs `trueflow diff`.
  • System calculates `Merkle(Disk)` vs `Merkle(HEAD)`.
  • Identification of changed Blocks.
  • Filters out Blocks that are already approved (even if they moved).
  • Presents only the Semantic Delta.

4. Implementation Plan

Phase 1: The Core (Merkle Engine)

  • [ ] `block.rs`: Define `Unit` (Line) and `Block` (Node).
  • [ ] `hasher.rs`: Implement Merkle Tree construction from file content.
  • [ ] `store.rs`: Update to index by Content Hash (not just Hunk Hash).

Phase 2: The Slicer

  • [ ] `scanner.rs`: Walk directory, ignore `.git`, build Trees.
  • [ ] `stats` command: Output coverage %.

Phase 3: Identity & Trust

  • [ ] Verify GPG signatures on reviews.
  • [ ] Policy: “Require 2 signatures from Domain X”.

Phase 4: Emacs UI Update

  • [ ] Render the Tree structure.
  • [ ] Allow drilling down (Review File -> Review Block -> Review Line).

6. FAQ / Design Questions

How do we handle identical blocks (boilerplate)?

  • Issue: Identical code blocks appearing in multiple files produce the same content-addressed review target.
  • Current CLI/API note: The public field name for this review-target identifier is still fingerprint, even when it refers to a block/content hash.
  • Decision: By default, they share the same review status (approving one approves all).
  • Override: If they require separate context, the path_hint metadata helps the reviewer identify the location. Future iterations could add a unique_salt to the review-target identifier if strictly separate reviews are required for identical content.

What counts as “whitespace normalization”?

  • All leading/trailing whitespace on lines is trimmed.
  • Consecutive internal whitespace is collapsed to a single space.
  • Empty lines are ignored for the body hash? (TBD: probably keep them but normalized).
  • Goal: `int x = 1;` and `int x=1;` should NOT match, but indentation changes should match.

Conflict Resolution

  • Scenario: Two users push to the configured storage branch simultaneously.
  • Strategy: Git handles the text merge of JSONL (union).
  • Logical Conflict: User A approves, User B rejects.
  • Resolution: Both records exist. The UI aggregates them. (“One rejection blocks approval” or “Latest vote wins”). MVP: Latest timestamp wins.

Compaction

  • Issue: reviews.jsonl grows indefinitely.
  • Plan: A future trueflow compact command will rewrite the branch, squashing history into a snapshot of current valid state + recent history, archiving the rest.

CI Integration

  • MVP: trueflow check binary runs in CI.
  • Logic: Fetches the configured storage branch, computes local hashes of src/.
  • Fail: If any block in the PR (base..HEAD) is not “approved” in the DB.
  • Trusted: Start informational. Later, enforce GPG signatures on review records.