Fix broken dashboard panels (Active Sessions, Tool Success Rate, API Error Rate) + add usage panels#11
Open
404pilo wants to merge 1 commit into
Open
Conversation
Three panels query data that never resolves against a real stack: - "Active Sessions" uses increase(claude_code_session_count_total), but session.count is a one-shot counter; the collector's Prometheus exporter drops it after the default ~5m metric_expiration, so the panel reads 0. Derive active sessions from the continuously-updated active_time metric instead, and raise the exporter's metric_expiration to 2h so one-shot counters (session / lines_of_code / commit / PR / code_edit_decision) survive between updates. - "Tool Success Rate" and "API Error Rate" pipe Loki events through `| json`, but Claude Code log bodies are the literal event name and every field is a structured-metadata label. `| json` throws JSONParserErr and silently zeroes both panels. Filter labels directly (`| success="true"`, `sum by (status_code)`). Adds six panels: Cache Hit Ratio, Cost by query_source (main vs subagent vs auxiliary), Token Spend by Model, Active Time (user vs cli), Tool Decision accept/reject, and API Latency P95 by model. All queries verified against a live OTel Collector + Prometheus + Loki stack receiving real Claude Code telemetry. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three panels in the bundled Grafana dashboard never resolve against a real stack. This PR fixes their root causes and adds six high-value panels. Every query was verified against a live OTel Collector + Prometheus + Loki stack receiving real Claude Code telemetry.
Bugs fixed
1. "Active Sessions" always shows 0
The panel uses
increase(claude_code_session_count_total[1h]), butsession.countis a one-shot counter — it ticks once at session start and never updates. The Collector's Prometheus exporter drops idle series after the defaultmetric_expiration(~5m), so the series disappears andincrease()has nothing to compute.Fix: derive active sessions from the continuously-updated
active_timemetric:…and raise the exporter's
metric_expirationto2hso one-shot counters (session,lines_of_code,commit,pull_request,code_edit_decision) survive between updates. Without this, "Lines of Code", "Code Changes", and "Development Activity" also flatline once idle.2. "Tool Success Rate" and "API Error Rate" show "No data"
Both pipe Loki events through
| json. But Claude Code's OTLP log events put the event name in the log body (e.g.claude_code.tool_result) and every field (success,tool_name,status_code, …) in structured-metadata labels.| jsontries to parse the body, throwsJSONParserErron every line, and silently zeroes the panels.Fix: filter the labels directly —
| success="true",sum by (status_code) (…)— no| jsonstage.New panels
cacheRead / (cacheRead + input); cost efficiency at a glancedecisionlabel onclaude_code.tool_decisionTest plan
claude-code-dashboard.jsonis valid JSON (29 panels):9090) and Loki (:3100)metric_expiration: 2h🤖 Generated with Claude Code