Skip to content

feat(gastown): emit reconciler metrics to Analytics Engine and add Grafana dashboard panels#1372

Open
jrf0110 wants to merge 1 commit intoconvoy/reconciler-phase-5-debug-endpoints-grafa/4763028e/headfrom
convoy/reconciler-phase-5-debug-endpoints-grafa/4763028e/gt/maple/97e04046
Open

feat(gastown): emit reconciler metrics to Analytics Engine and add Grafana dashboard panels#1372
jrf0110 wants to merge 1 commit intoconvoy/reconciler-phase-5-debug-endpoints-grafa/4763028e/headfrom
convoy/reconciler-phase-5-debug-endpoints-grafa/4763028e/gt/maple/97e04046

Conversation

@jrf0110
Copy link
Contributor

@jrf0110 jrf0110 commented Mar 21, 2026

Summary

  • Extended writeEvent() in analytics.util.ts to support double3double10 fields, enabling reconciler metrics emission without breaking existing callers (new fields default to 0).
  • Added reconciler_tick event emission after each alarm tick in Town.do.ts, carrying all 9 ReconcilerMetrics fields (wallClockMs, eventsDrained, actionsEmitted, sideEffectsAttempted/Succeeded/Failed, invariantViolations, pendingEventCount) plus actionsByType as JSON in blob10.
  • Added a "Reconciler" row to the Grafana dashboard with 6 panels: Events drained (timeseries), Actions by type (stacked bar), Side effects (3-series timeseries), Invariant violations (stat with >0 threshold), Wall clock time (timeseries with >500ms threshold), Pending event queue depth (gauge with >50 threshold).

Verification

  • pnpm typecheck — passes (all workspace projects clean)
  • Code review: field mapping between emitEvent call and Grafana SQL queries verified consistent
  • No duplicate panel IDs in Grafana dashboard JSON
  • Backwards compatibility: existing writeEvent callers unaffected (new double fields default to 0)

Visual Changes

N/A

Reviewer Notes

  • The writeEvent() doubles array grew from 2 to 10 entries. Analytics Engine supports up to 20, so this is well within limits.
  • actionsByType is stored as JSON string in blob10 (via label field) and parsed in Grafana using JSONExtractKeysAndValues(blob10, 'Float64').
  • The Grafana queries use SUM(doubleN * _sample_interval) / SUM(_sample_interval) for weighted averages and SUM(doubleN * _sample_interval) for counts — appropriate aggregation for sampled AE data.

…afana dashboard panels

- Extend writeEvent() to support double3-double10 fields for reconciler metrics
- Emit reconciler_tick event after each alarm tick with all 9 metrics
- Add Reconciler row to Grafana dashboard with 6 panels:
  1. Events drained per tick (timeseries)
  2. Actions emitted per tick by type (stacked bar)
  3. Side effects attempted/succeeded/failed (timeseries)
  4. Invariant violations (stat with >0 alert threshold)
  5. Reconciler wall clock time (timeseries with >500ms threshold)
  6. Pending event queue depth (gauge with >50 threshold)
"interval": "",
"intervalFactor": 1,
"nullifySparse": false,
"query": "SELECT SUM(double8 * _sample_interval) / SUM(_sample_interval) AS pending_events FROM gastown_events WHERE $timeFilter AND blob1 = 'reconciler_tick' ORDER BY timestamp DESC LIMIT 1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: This gauge is showing a time-window average, not the current queue depth

SUM(double8 * _sample_interval) / SUM(_sample_interval) collapses every reconciler_tick in the selected range into a single weighted average, so the panel will not show the latest backlog value. For a queue-depth gauge we need the most recent double8 sample instead, and rawSql should be updated to match.

@kilo-code-bot
Copy link
Contributor

kilo-code-bot bot commented Mar 21, 2026

Code Review Summary

Status: 1 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 1
SUGGESTION 0

Fix these issues in Kilo Cloud

Issue Details (click to expand)

WARNING

File Line Issue
cloudflare-gastown/gastown-grafana-dash-1.json 3163 Pending queue depth gauge averages the selected time range instead of showing the latest sample
Other Observations (not in diff)

None.

Files Reviewed (3 files)
  • cloudflare-gastown/gastown-grafana-dash-1.json - 1 issue
  • cloudflare-gastown/src/dos/Town.do.ts - 0 issues
  • cloudflare-gastown/src/util/analytics.util.ts - 0 issues

Reviewed by gpt-5.4-20260305 · 550,264 tokens

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant