PatchPro Trace Viewer Guide

Visual debugging tool for patch generation traces

The Trace Viewer is a Streamlit-based UI for analyzing LLM interactions, failures, and performance metrics captured during patch generation.

Quick Start

1. Install Observability Dependencies

pip install -e ".[observability]"

This installs:

streamlit - Web UI framework
plotly - Interactive charts (future use)
pandas - Data analysis (future use)

2. Generate Traces (If Needed)

If you don't have traces yet, run PatchPro with agentic mode:

# From patchpro-bot-agent-dev directory
patchpro analyze-pr --base main --head HEAD --with-llm

This creates .patchpro/traces/:

traces.db - SQLite database (queryable)
*.json - Individual trace files (human-readable)

3. Launch Trace Viewer

# From patchpro-bot-agent-dev directory
streamlit run trace_viewer.py

Opens browser at http://localhost:8501

4. Explore Traces

The UI shows:

Summary Metrics: Total traces, success rate, avg cost, avg latency
Filters: Rule ID, status, strategy, text search
Trace Cards: Expandable cards with full details for each attempt

What You Can See

Per-Trace Information

Expandable Card Header:

🟢 F401 - example.py:15 - Attempt 1

Status: 🟢 success, 🔴 failed, ⚠️ exhausted_retries
Rule ID: Ruff/Semgrep rule being fixed
File and line number
Retry attempt number

Card Details:

Metadata: Strategy, model, file type, complexity, tokens, cost, latency
Finding: The original issue message
Prompt: System + user prompts sent to LLM (collapsed)
LLM Response: Raw LLM output (collapsed)
Generated Patch: The unified diff patch (if generated)
Validation Errors: Git apply errors (if failed)
Previous Errors: Errors from earlier attempts (if retry)

Summary Metrics

Top Banner:

Total traces logged
Success rate (% patches that validated)
Average cost per patch
Average latency per patch
Total cost across all attempts
Average retry attempt number

Filters

Search/Filter By:

Rule ID: Focus on specific rule types (F401, D100, etc.)
Status: success, failed, exhausted_retries
Strategy: generate_single_patch, generate_batch_patch
Text Search: Find by message text or file path

Use Cases

1. Debug Why Patches Fail

Scenario: You see low success rate in metrics.

Steps:

Filter by Status = failed
Open failed traces
Look at Validation Errors section
Common patterns:
- "patch does not apply" → Wrong line numbers
- "malformed patch" → LLM output formatting issue
- "unexpected end of file" → Multi-line string corruption

Action: Use error patterns to improve prompts or add post-processing.

2. Analyze Retry Behavior

Scenario: Want to see if retries actually help.

Steps:

Search for same file/line with different attempt numbers
Compare LLM Response between attempts
Check Previous Errors section in retry attempts
See if LLM learned from feedback

What to look for:

Does attempt 2/3 fix issues from attempt 1?
Are errors repeated (LLM stuck)?
Does retry cost justify success rate improvement?

3. Identify High-Cost Rules

Scenario: Want to optimize token usage.

Steps:

Look at Avg Cost in summary
Open traces with highest cost
Check Tokens Used and Prompt length
Identify if verbose prompts for certain rules

Action: Shorten prompts for high-volume rules.

4. Compare Strategies

Scenario: Should you use batch or single patch mode?

Steps:

Filter by Strategy = generate_batch_patch → Check success rate
Filter by Strategy = generate_single_patch → Check success rate
Compare costs, latency, success rate

Current expectation: Batch likely fails more (0% in some cases).

5. Find Good Training Examples

Scenario: Building fine-tuning dataset.

Steps:

Filter by Status = success
Filter by Rule ID = F401 (or target rule)
Open traces with clean patches
Click "Save as Good Example" (feature coming soon)

Future: Saved examples export to fine-tuning JSON format.

Example Workflow: Fix Batch Patches

Goal: Understand why batch patches fail and fix them.

Step 1: Gather Data

# Filter for batch patch failures
Filter: Strategy = generate_batch_patch, Status = failed

Step 2: Analyze 10 Failed Traces

Open first 10 failed batch traces. Look for patterns:

Hypothesis 1: Wrong line numbers in multi-hunk diffs

Check patches: Are @@ hunk headers correct?
Check validation errors: "patch does not apply to line X"
Pattern: Second/third hunk has wrong line numbers

Hypothesis 2: LLM corrupts file content

Check patches: Are unchanged lines modified?
Check validation errors: "unexpected content at line X"
Pattern: LLM hallucinates code that wasn't there

Hypothesis 3: Prompt too complex

Check prompts: How many findings in one request?
Pattern: Batch of 5+ findings → higher failure rate

Step 3: Test Fix

Based on hypothesis, implement fix:

Hypothesis 1 Fix: Add post-processing to recalculate hunk headers Hypothesis 2 Fix: Add instruction "DO NOT modify unchanged lines" Hypothesis 3 Fix: Limit batch size to 3 findings max

Step 4: Measure Improvement

# Re-run with fix
patchpro analyze-pr --base main --head HEAD --with-llm

# Compare success rates
Old batch success rate: 0%
New batch success rate: ??%

Step 5: Iterate

If success rate improved but still below target (70%), repeat process.

View Traces from CI

Download Traces from GitHub Actions

If you have traces from CI workflow run:

# 1. Download artifact from GitHub Actions
# (Currently: artifact upload has path issue, but traces ARE created)

# 2. Extract to local directory
unzip patchpro-traces.zip -d /path/to/traces

# 3. Launch viewer with custom path
streamlit run trace_viewer.py -- --trace-dir /path/to/traces

View Demo Traces from PR #9

Live example: Workflow run 18263485405 in patchpro-demo-repo

What to look for:

Navigate to Actions → run 18263485405
Check logs for "Agentic mode: True"

See debug output listing trace files:

.patchpro/traces/F841_example.py_9_1_*.json
.patchpro/traces/F841_example.py_9_3_*.json

Notice attempt_1 and attempt_3 for same finding → retry worked!

To view locally (once artifact upload fixed):

# Download artifact, then:
streamlit run trace_viewer.py -- --trace-dir ./downloaded-traces

Keyboard Shortcuts

R - Rerun app (refresh data)
Ctrl+C in terminal - Stop server
Browser refresh - Reload page

Troubleshooting

"No traces database found"

Cause: Haven't run PatchPro with agentic mode yet.

Fix:

patchpro analyze-pr --base main --head HEAD --with-llm

Trace viewer doesn't show new traces

Cause: Streamlit caches data.

Fix: Press R to rerun app, or refresh browser.

Can't install streamlit

Cause: Observability dependencies not installed.

Fix:

pip install -e ".[observability]"

Traces exist but not in default location

Cause: Traces in custom directory.

Fix:

streamlit run trace_viewer.py -- --trace-dir /path/to/traces

Future Enhancements

Coming in Phase 2.2 (Failure Clustering)

Automatic clustering of similar failures
"Top 5 failure modes" section
Pattern recognition using embeddings

Coming in Phase 2.3 (Cost Tracking)

Interactive charts (Plotly)
Cost trends over time
Cost by rule category
Latency distribution histograms

Coming in Phase 3.4 (Fine-Tuning)

Export selected traces to fine-tuning JSON
One-click dataset curation
Human labeling interface
Agreement scoring with LLM-as-judge

Tips for Effective Debugging

1. Start with Summary Metrics

Don't dive into individual traces immediately. Check:

What's the overall success rate? (Below 50% → systemic issue)
What's the retry rate? (High → validation often fails)
What's the cost? (High → prompts too verbose)

2. Filter Strategically

Don't look at all traces. Focus on:

First: Failed traces for most common rule
Second: Successful traces for same rule (compare)
Third: Exhausted retries (hardest cases)

3. Look for Patterns, Not Individual Bugs

One failed trace = edge case. Ten failed traces with same error = systemic issue.

4. Compare Prompt vs Response

Most bugs are in the LLM's interpretation of the prompt:

Is the prompt clear?
Does it include enough context?
Does the response follow instructions?

5. Check Previous Errors in Retries

If retry fails again, did the LLM:

Ignore previous error feedback?
Misunderstand the error?
Make the same mistake differently?

Questions?

How do I share traces with team? → Commit .patchpro/traces/*.json to git (or zip and share)

Can I query traces programmatically? → Yes! Use telemetry.PatchTracer.query_traces() API

Can I use this in CI? → Not yet (Streamlit needs interactive browser), but you can query SQLite in CI scripts

Where's the clustering feature? → Coming in Phase 2.2 (current focus: manual exploration)

Last Updated: 2025-10-06
Status: Phase 2.1 Complete 🎉
Next: Phase 2.2 (Failure Clustering)

FilesExpand file tree

TRACE_VIEWER_GUIDE.md

Latest commit

History

TRACE_VIEWER_GUIDE.md

File metadata and controls

PatchPro Trace Viewer Guide

Quick Start

1. Install Observability Dependencies

2. Generate Traces (If Needed)

3. Launch Trace Viewer

4. Explore Traces

What You Can See

Per-Trace Information

Summary Metrics

Filters

Use Cases

1. Debug Why Patches Fail

2. Analyze Retry Behavior

3. Identify High-Cost Rules

4. Compare Strategies

5. Find Good Training Examples

Example Workflow: Fix Batch Patches

Step 1: Gather Data

Step 2: Analyze 10 Failed Traces

Step 3: Test Fix

Step 4: Measure Improvement

Step 5: Iterate

View Traces from CI

Download Traces from GitHub Actions

View Demo Traces from PR #9

Keyboard Shortcuts

Troubleshooting

"No traces database found"

Trace viewer doesn't show new traces

Can't install streamlit

Traces exist but not in default location

Future Enhancements

Coming in Phase 2.2 (Failure Clustering)

Coming in Phase 2.3 (Cost Tracking)

Coming in Phase 3.4 (Fine-Tuning)

Tips for Effective Debugging

1. Start with Summary Metrics

2. Filter Strategically

3. Look for Patterns, Not Individual Bugs

4. Compare Prompt vs Response

5. Check Previous Errors in Retries

Related Documentation

Questions?