Skip to content

feat: monitor package — production decision replay + shared viewer#61

Merged
Chouffe merged 60 commits into
mainfrom
arthur/monitor-viewer-import
Jun 12, 2026
Merged

feat: monitor package — production decision replay + shared viewer#61
Chouffe merged 60 commits into
mainfrom
arthur/monitor-viewer-import

Conversation

@Chouffe

@Chouffe Chouffe commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds monitor/, a sixth package answering "what did the temporal model decide in production, and why?" — plus the viewer integration to explore it. Design: docs/specs/2026-06-12-monitor-design.md.

image

monitor/ (new package, no core/torch dependency)

  • temporal-monitor import — pulls scored sequences from alert-api (GET-only + login) into a DVC-tracked store with full provenance (recorded score, temporal_model_version, temporal_api_version, per-detection bucket_key + verbatim bbox). Incremental and idempotent; --all-orgs scans the global sequence-id space (admin token) so every organization imports despite the org-scoped listing endpoint; --exclude-org filters (e.g. the CI camera org).
  • temporal-monitor replay (the dvc.yaml stage) — re-runs each sequence through the exact pinned release that scored it (pyronear/temporal-model-api:<tag>, model.zip baked in) + throwaway MinIO, reconstructing the production call byte-for-byte (frame window + ROI ported line-for-line from pyro-api's validation worker, fuzz-verified over 200k cases). Writes the eval-viewer reporting contract under one alert-api source.
  • Consistency machinery: recorded-vs-replayed score check (1e-5 cross-hardware tolerance); on mismatch, an automatic window probe identifies the exact earlier frame window production scored (matched_window_frames). Live result on June 12 data: 91/91 production scores reproduced (23 exact, 52 window drift — production scores early and freezes on validation).
  • Trigger enrichment (--trigger-image): a second pass through a newer serving build (same model.zip) fills trigger_frame_index/first_crossing_frame for releases predating compute_trigger, merged only when it reproduces the pinned score (91/91 agreed).

viewer/ (shared, additive)

  • Monitor mode (auto-detected from row fields): production verdict as the row (green/gray tint), started/organization sortable columns, org filter cascading into the camera filter, MonitorCards rail (day span, kept/discarded counts + %, clickable per-org breakdown, replay explainer), replay diagnostics in the detail pane, source picker hidden for single-source data.
  • Eval rendering is pinned byte-identical by tests; regression-checked live against freshly regenerated train/val artifacts.

Repo wiring

Root Makefile PACKAGES, CI matrix, README table, .envrc convention (+ gitignore), DVC remote at s3://pyro-vision-rd/dvc/temporal-model/monitor/ (store + reports pushed for credential-less dvc pull consumption).

Test plan

  • monitor: 66 tests (offline: mocked HTTP, fake docker stack), ruff check + format
  • viewer: 39 vitest tests, eslint, prettier, tsc
  • repo-wide make lint green
  • Live e2e on production data: import (9 orgs incl. sdis-12) → replay (91 replayed / 0 unexplained) → DATA_ROOT=../monitor npm run dev
  • Eval viewer regression check against regenerated train/val artifacts (no regressions; surfaced a pre-existing dataset duplicate: pyronear-sdis-07_brison_226_2023-07-14T12-40-59 exists under both train labels — follow-up for data curation, untouched here)

Try it (load the predictions, ~2 min)

No alert-api credentials and no Docker needed — the store and the replayed
predictions are on the DVC remote (same S3 access as the train/eval remotes):

git checkout arthur/monitor-viewer-import
cd monitor && make install        # uv sync
uv run dvc pull                   # fetches the sequence store + reporting tree
cd ../viewer && npm ci
DATA_ROOT=../monitor npm run dev  # open http://localhost:3000

You'll see the June 12 production day: 91 scored sequences across 9
organizations (sdis-07, sdis-40, sdis-77, sdis-12, ...), each row carrying
production's verdict and score. Click a row to see the replayed tubes, YOLO
boxes, stabilized crops and trigger frame; the detail pane shows the replay
diagnostics (replay prob, match status, the window production scored). The
left rail has the kept/discarded stats and a clickable per-org breakdown.

To import fresh sequences yourself (alert-api credentials required) see
monitor/README.mdcp .envrc.example .envrc, then
make import ARGS="--all-orgs --exclude-org pyroadmins" and uv run dvc repro
(Docker required for the replay).

Follow-ups (pyro-api side, discussed)

  • Admin listing on /sequences/all/fromdate would replace the id-scan import
  • Persisting the scored frame keys would make replays exact without window probing

Chouffe added 30 commits June 12, 2026 09:18
When import_platform encounters zero new sequences, the store dir was
not created, causing subsequent 'dvc add data/01_raw/sequences' to fail.
Now mkdir happens at the start of import_platform, ensuring the dir exists
even for empty runs. Added .gitkeep files to track data/ layout in git.
When replay_matches is false, probe ascending distinct-frame windows
(n=MIN_FRAMES..total) to find whether the mismatch is window drift
(production scored early and stopped; later frames shifted our window).
The first window whose score is within SCORE_TOLERANCE of the recorded
value sets matched_window_frames on the row; if no window matches,
the field stays null (genuine drift). Summary gains window_drift counter.
Chouffe added 27 commits June 12, 2026 14:35
All orgs now land in one reporting tree at data/08_reporting/alert-api/vit_dinov2_finetune/;
organization_name on each row carries the raw org name for display as a viewer column.
Add monitorMode detection (rows.some replayed_probability) threaded to FilterBar and SequenceTable;
organization column shown only in monitorMode (sortable); correctness column and outcome/GT filters hidden in monitorMode.
Add MonitorCards component for monitor mode stats (kept/discarded counts
and per-org breakdown) in ControlRail; cascade camera dropdown to show
only cameras matching the selected organization.
@Chouffe

Chouffe commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator Author

Deferred follow-ups from the multi-lens review (none blocking):

monitor/

  • replay.py's reports dict-of-one is vestigial since the single-source collapse — could flatten to one OrgReport.
  • Per-sequence replay caching keyed on (api_version, window, roi) would make dvc repro incremental (today the whole stage re-runs; the model is deterministic so this is sound).
  • DockerHub images are pulled by mutable tag, not digest — acceptable for a diagnostic tool, pin by digest if replay output ever becomes authoritative.

viewer/

  • DetailPanel discriminates monitor rows per-row (replayed_probability !== undefined) while the table does it per-source — latent inconsistency if a source ever mixes shapes; unify on a passed monitorMode prop.

pyro-api (discussed with the team)

  • Admin branch in fetch_sequences_from_date would replace monitor's id-space scan with one clean listing call.
  • Persisting the scored frame bucket_keys alongside the score would make replays exact and remove the window-probe heuristic.

data

  • pyronear-sdis-07_brison_226_2023-07-14T12-40-59 exists in the train dataset under BOTH labels (fp and wildfire) — contradictory ground truth that also double-counts in eval metrics and collides eval artifact keys.

@Chouffe Chouffe requested a review from MateoLostanlen June 12, 2026 14:34
@Chouffe Chouffe merged commit 3897440 into main Jun 12, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant