Skip to content

A/B Test: CacheBro Impact on AI Coding Agents (OpenCode, Claude, Codex) #585

@kimiko-terraphim

Description

@kimiko-terraphim

Summary

Design and execute a comprehensive A/B test to measure the impact of CacheBro (MCP-based file caching) on AI coding agent performance across three major CLI tools: OpenCode, Claude Code, and Codex CLI.

Background

CacheBro is an MCP server that caches file reads for AI coding agents, returning diffs or "unchanged" confirmations instead of full file content. The project claims ~26% token savings.

CacheBro Architecture:

  • Language: Rust (v0.2.3)
  • Protocol: MCP (Model Context Protocol)
  • Database: SQLite with SHA-256 hash-based change detection
  • Integration: Drop-in MCP server for Claude Code, Cursor, OpenCode

Objectives

  1. Validate the claimed 26% token savings across different CLI tools
  2. Measure cost reduction impact
  3. Assess cache hit rates for different coding task types
  4. Ensure quality preservation (no degradation in task completion)
  5. Identify tool-specific benefits from file caching

Test Design

Issues Selection

  • 30 real issues from Terraphim repositories
  • 10 bug fixes, 10 feature implementations, 10 refactoring tasks
  • Sources: terraphim-ai, terraphim-skills, gitea, openclaw-workspace

Experimental Conditions

  • Control: Agents without CacheBro (baseline file reading)
  • Treatment: Agents with CacheBro MCP server enabled
  • Tools: OpenCode, Claude Code, Codex CLI
  • Total Runs: 30 issues × 3 tools × 2 conditions = 180 runs

Key Metrics

Metric Target Measurement
Token Savings 26% Input + Output tokens per session
Cost Reduction 26% API cost per issue
Cache Hit Rate 60% Cache hits / Total file reads
Quality Preservation 100% Task completion rate

Hypotheses

  • H1: CacheBro reduces token usage by ≥20% (approaching claimed 26%)
  • H2: CacheBro reduces API costs by ≥20%
  • H3: CacheBro maintains task completion rate within 5% of baseline
  • H4: Tool benefit varies based on file access patterns

Implementation

Phase 1: Setup (2 days)

  • Install CacheBro on test infrastructure
  • Configure MCP integration for each tool
  • Build test harness for automated execution
  • Validate metrics collection

Phase 2: Pilot (2 days)

  • Run 10% of issues (3 per category)
  • Validate statistical approach
  • Tune CacheBro configuration if needed

Phase 3: Full Run (5 days)

  • Execute all 180 test runs
  • Collect token usage, cost, cache statistics
  • Monitor for failures or anomalies

Phase 4: Analysis (2 days)

  • Statistical analysis (Mann-Whitney U, ANOVA)
  • Effect size calculations (Cohen's d)
  • Visualization of results

Phase 5: Reporting (2 days)

  • Final report with findings
  • CacheBro tuning recommendations per tool
  • Publish results to team

Deliverables

  1. Raw data from 180 test runs (JSON)
  2. CacheBro SQLite databases from each session
  3. Statistical analysis notebook (Jupyter)
  4. Visualization dashboard
  5. Final report with recommendations

Risks & Mitigations

Risk Likelihood Mitigation
MCP integration issues Medium Test each tool's MCP support early
Low cache hit rate Low Select file-heavy issues
Tool crashes Medium Implement retry logic

References

Timeline

Total: 13 days

  • Setup: 2 days
  • Pilot: 2 days
  • Full Run: 5 days
  • Analysis: 2 days
  • Reporting: 2 days

/cc @AlexMikhalev
/label ~experiment ~performance ~cachebro
/milestone %Q1-2026

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions