feat: monitor package — production decision replay + shared viewer by Chouffe · Pull Request #61 · pyronear/temporal-model

Chouffe · 2026-06-12T13:53:35Z

Summary

Adds monitor/, a sixth package answering "what did the temporal model decide in production, and why?" — plus the viewer integration to explore it. Design: docs/specs/2026-06-12-monitor-design.md.

monitor/ (new package, no core/torch dependency)

temporal-monitor import — pulls scored sequences from alert-api (GET-only + login) into a DVC-tracked store with full provenance (recorded score, temporal_model_version, temporal_api_version, per-detection bucket_key + verbatim bbox). Incremental and idempotent; --all-orgs scans the global sequence-id space (admin token) so every organization imports despite the org-scoped listing endpoint; --exclude-org filters (e.g. the CI camera org).
temporal-monitor replay (the dvc.yaml stage) — re-runs each sequence through the exact pinned release that scored it (pyronear/temporal-model-api:<tag>, model.zip baked in) + throwaway MinIO, reconstructing the production call byte-for-byte (frame window + ROI ported line-for-line from pyro-api's validation worker, fuzz-verified over 200k cases). Writes the eval-viewer reporting contract under one alert-api source.
Consistency machinery: recorded-vs-replayed score check (1e-5 cross-hardware tolerance); on mismatch, an automatic window probe identifies the exact earlier frame window production scored (matched_window_frames). Live result on June 12 data: 91/91 production scores reproduced (23 exact, 52 window drift — production scores early and freezes on validation).
Trigger enrichment (--trigger-image): a second pass through a newer serving build (same model.zip) fills trigger_frame_index/first_crossing_frame for releases predating compute_trigger, merged only when it reproduces the pinned score (91/91 agreed).

viewer/ (shared, additive)

Monitor mode (auto-detected from row fields): production verdict as the row (green/gray tint), started/organization sortable columns, org filter cascading into the camera filter, MonitorCards rail (day span, kept/discarded counts + %, clickable per-org breakdown, replay explainer), replay diagnostics in the detail pane, source picker hidden for single-source data.
Eval rendering is pinned byte-identical by tests; regression-checked live against freshly regenerated train/val artifacts.

Repo wiring

Root Makefile PACKAGES, CI matrix, README table, .envrc convention (+ gitignore), DVC remote at s3://pyro-vision-rd/dvc/temporal-model/monitor/ (store + reports pushed for credential-less dvc pull consumption).

Test plan

monitor: 66 tests (offline: mocked HTTP, fake docker stack), ruff check + format
viewer: 39 vitest tests, eslint, prettier, tsc
repo-wide make lint green
Live e2e on production data: import (9 orgs incl. sdis-12) → replay (91 replayed / 0 unexplained) → DATA_ROOT=../monitor npm run dev
Eval viewer regression check against regenerated train/val artifacts (no regressions; surfaced a pre-existing dataset duplicate: pyronear-sdis-07_brison_226_2023-07-14T12-40-59 exists under both train labels — follow-up for data curation, untouched here)

Try it (load the predictions, ~2 min)

No alert-api credentials and no Docker needed — the store and the replayed
predictions are on the DVC remote (same S3 access as the train/eval remotes):

git checkout arthur/monitor-viewer-import
cd monitor && make install        # uv sync
uv run dvc pull                   # fetches the sequence store + reporting tree
cd ../viewer && npm ci
DATA_ROOT=../monitor npm run dev  # open http://localhost:3000

You'll see the June 12 production day: 91 scored sequences across 9
organizations (sdis-07, sdis-40, sdis-77, sdis-12, ...), each row carrying
production's verdict and score. Click a row to see the replayed tubes, YOLO
boxes, stabilized crops and trigger frame; the detail pane shows the replay
diagnostics (replay prob, match status, the window production scored). The
left rail has the kept/discarded stats and a clickable per-org breakdown.

To import fresh sequences yourself (alert-api credentials required) see
monitor/README.md — cp .envrc.example .envrc, then
make import ARGS="--all-orgs --exclude-org pyroadmins" and uv run dvc repro
(Docker required for the replay).

Follow-ups (pyro-api side, discussed)

Admin listing on /sequences/all/fromdate would replace the id-scan import
Persisting the scored frame keys would make replays exact without window probing

When import_platform encounters zero new sequences, the store dir was not created, causing subsequent 'dvc add data/01_raw/sequences' to fail. Now mkdir happens at the start of import_platform, ensuring the dir exists even for empty runs. Added .gitkeep files to track data/ layout in git.

…e 1-12)

When replay_matches is false, probe ascending distinct-frame windows (n=MIN_FRAMES..total) to find whether the mismatch is window drift (production scored early and stopped; later frames shifted our window). The first window whose score is within SCORE_TOLERANCE of the recorded value sets matched_window_frames on the row; if no window matches, the field stays null (genuine drift). Summary gains window_drift counter.

…ifacts

…gnostic

All orgs now land in one reporting tree at data/08_reporting/alert-api/vit_dinov2_finetune/; organization_name on each row carries the raw org name for display as a viewer column.

Add monitorMode detection (rows.some replayed_probability) threaded to FilterBar and SequenceTable; organization column shown only in monitorMode (sortable); correctness column and outcome/GT filters hidden in monitorMode.

…nitor mode

…ane in monitor mode

…by score agreement

Add MonitorCards component for monitor mode stats (kept/discarded counts and per-org breakdown) in ControlRail; cascade camera dropdown to show only cameras matching the selected organization.

… by default

…rden the id scan

…ailures

Chouffe · 2026-06-12T14:29:08Z

Deferred follow-ups from the multi-lens review (none blocking):

monitor/

replay.py's reports dict-of-one is vestigial since the single-source collapse — could flatten to one OrgReport.
Per-sequence replay caching keyed on (api_version, window, roi) would make dvc repro incremental (today the whole stage re-runs; the model is deterministic so this is sound).
DockerHub images are pulled by mutable tag, not digest — acceptable for a diagnostic tool, pin by digest if replay output ever becomes authoritative.

viewer/

DetailPanel discriminates monitor rows per-row (replayed_probability !== undefined) while the table does it per-source — latent inconsistency if a source ever mixes shapes; unify on a passed monitorMode prop.

pyro-api (discussed with the team)

Admin branch in fetch_sequences_from_date would replace monitor's id-space scan with one clean listing call.
Persisting the scored frame bucket_keys alongside the score would make replays exact and remove the window-probe heuristic.

data

pyronear-sdis-07_brison_226_2023-07-14T12-40-59 exists in the train dataset under BOTH labels (fp and wildfire) — contradictory ground truth that also double-counts in eval metrics and collides eval artifact keys.

Chouffe added 30 commits June 12, 2026 09:18

docs(specs): design for the monitor package (production replay + viewer)

2826372

docs(specs): DVC-track the monitor store and replay artifacts

1573ca3

docs(plans): implementation plan for the monitor package

f1affaf

feat(monitor): scaffold the monitor package

4446c33

feat(monitor): sequence store with replay provenance

f577b8a

fix(monitor): keep accented characters readable in slugs

05595ff

feat(monitor): alert-api client with full-detection pagination

4a9257b

docs(monitor): clarify pagination cap and org scoping; test empty page

3f0b756

feat(monitor): incremental alert-api import command

9f91bc3

test(monitor): cover --force re-import; clarify pagination docs

a166a42

feat(monitor): port pyro-api frame/ROI reconstruction

ed356a3

feat(monitor): stabilized-window geometry port

8e52336

feat(monitor): eval-viewer contract writers

95ff7ce

fix(monitor): slugify results source to match the reporting tree dir

36b5cd6

docs(plans): slugify results/view source to match the tree dir

2f96cf9

feat(monitor): pinned-release replay stack

03eb099

fix(monitor): start api only after the frames bucket exists

e130031

feat(monitor): version-pinned replay orchestration

0439fd2

fix(monitor): isolate unhealthy-stack failures to their version group

69e1beb

docs: add stack_unhealthy to the replay drop reasons

2ba5bac

feat(monitor): dvc pipeline (replay stage) and workflow docs

3e4042f

feat(viewer): show monitor provenance columns when present

318fe82

fix(monitor): untrack the replay output dir (dvc-managed, not git)

6c2d716

chore(monitor): first imported store + replay lock (seq 47364)

6c8e80c

refactor(monitor): call the caller alert-api, not platform

6809855

chore(monitor): refresh store with alert-api naming (8 sequences, Jun…

d5b51fa

…e 1-12)

perf(monitor): skip the redundant final window probe

fac2441

chore(monitor): replay artifacts with window-drift probe results

23f7f1d

Chouffe added 27 commits June 12, 2026 14:35

chore(monitor): june-12 all-orgs store and fully-explained replay art…

cdaf2d7

…ifacts

feat(viewer): tint unlabeled rows by keep/discard verdict

ba399dc

feat(monitor): results rows carry production's verdict; replay is dia…

9707e9c

…gnostic

feat(viewer): demote replay diagnostics to the detail pane

db75c24

feat(monitor): single alert-api source with organizations as row data

6b8a7c6

All orgs now land in one reporting tree at data/08_reporting/alert-api/vit_dinov2_finetune/; organization_name on each row carries the raw org name for display as a viewer column.

feat(viewer): started column, no ground-truth column, no slider in mo…

b0304fb

…nitor mode

chore(monitor): consolidated alert-api reporting tree

23501dc

feat(viewer): organization filter, styled dropdowns, no correctness p…

bf47a99

…ane in monitor mode

feat(viewer): hide the empty trigger-frame stat for monitor rows

6a07b1b

feat(monitor): trigger enrichment via a newer serving image, guarded …

0f3dced

…by score agreement

feat(viewer): monitor stats rail and org-scoped camera filter

90a1e2a

Add MonitorCards component for monitor mode stats (kept/discarded counts and per-org breakdown) in ControlRail; cascade camera dropdown to show only cameras matching the selected organization.

chore(monitor): trigger-enriched replay artifacts

5206d31

feat(viewer): monitoring day span and keep/discard percentages

36c27c3

feat(viewer): kept percentage in the per-organization breakdown

f266a14

feat(viewer): hide the source picker when only one source exists

8c8ca33

feat(viewer): explain the replay mechanism in the monitor rail

6b4cd50

feat(viewer): move the replay explainer below the org breakdown, open…

c5ed636

… by default

feat(viewer): click an org row to filter by that organization

d4ab0ac

style(viewer): org-row hover spans the card width

3bf6175

fix(monitor): guard null recorded scores and malformed image tags; ha…

4209369

…rden the id scan

docs: sync spec/plan/readme with the shipped monitor design

ecbf4cd

a11y(viewer): aria-pressed on the org filter toggles

220d3b5

docs(plans): drop the executed monitor implementation plan

b082d46

docs(monitor): make the dvc-pull consumer path explicit in the README

6f0291f

fix(monitor): retry frame downloads and isolate per-sequence import f…

c345fa2

…ailures

chore(monitor): delta import incl. sdis-12 and full re-replay

e5b49dc

Chouffe requested a review from MateoLostanlen June 12, 2026 14:34

Chouffe merged commit 3897440 into main Jun 12, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: monitor package — production decision replay + shared viewer#61

feat: monitor package — production decision replay + shared viewer#61
Chouffe merged 60 commits into
mainfrom
arthur/monitor-viewer-import

Chouffe commented Jun 12, 2026 •

edited

Loading

Uh oh!

Chouffe commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Chouffe commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

monitor/ (new package, no core/torch dependency)

viewer/ (shared, additive)

Repo wiring

Test plan

Try it (load the predictions, ~2 min)

Follow-ups (pyro-api side, discussed)

Uh oh!

Chouffe commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Chouffe commented Jun 12, 2026 •

edited

Loading