feat: monitor package — production decision replay + shared viewer#61
Merged
Conversation
When import_platform encounters zero new sequences, the store dir was not created, causing subsequent 'dvc add data/01_raw/sequences' to fail. Now mkdir happens at the start of import_platform, ensuring the dir exists even for empty runs. Added .gitkeep files to track data/ layout in git.
When replay_matches is false, probe ascending distinct-frame windows (n=MIN_FRAMES..total) to find whether the mismatch is window drift (production scored early and stopped; later frames shifted our window). The first window whose score is within SCORE_TOLERANCE of the recorded value sets matched_window_frames on the row; if no window matches, the field stays null (genuine drift). Summary gains window_drift counter.
All orgs now land in one reporting tree at data/08_reporting/alert-api/vit_dinov2_finetune/; organization_name on each row carries the raw org name for display as a viewer column.
Add monitorMode detection (rows.some replayed_probability) threaded to FilterBar and SequenceTable; organization column shown only in monitorMode (sortable); correctness column and outcome/GT filters hidden in monitorMode.
…ane in monitor mode
…by score agreement
Add MonitorCards component for monitor mode stats (kept/discarded counts and per-org breakdown) in ControlRail; cascade camera dropdown to show only cameras matching the selected organization.
Collaborator
Author
|
Deferred follow-ups from the multi-lens review (none blocking): monitor/
viewer/
pyro-api (discussed with the team)
data
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
monitor/, a sixth package answering "what did the temporal model decide in production, and why?" — plus the viewer integration to explore it. Design:docs/specs/2026-06-12-monitor-design.md.monitor/ (new package, no core/torch dependency)
temporal-monitor import— pulls scored sequences from alert-api (GET-only + login) into a DVC-tracked store with full provenance (recorded score,temporal_model_version,temporal_api_version, per-detectionbucket_key+ verbatim bbox). Incremental and idempotent;--all-orgsscans the global sequence-id space (admin token) so every organization imports despite the org-scoped listing endpoint;--exclude-orgfilters (e.g. the CI camera org).temporal-monitor replay(thedvc.yamlstage) — re-runs each sequence through the exact pinned release that scored it (pyronear/temporal-model-api:<tag>, model.zip baked in) + throwaway MinIO, reconstructing the production call byte-for-byte (frame window + ROI ported line-for-line from pyro-api's validation worker, fuzz-verified over 200k cases). Writes the eval-viewer reporting contract under onealert-apisource.matched_window_frames). Live result on June 12 data: 91/91 production scores reproduced (23 exact, 52 window drift — production scores early and freezes on validation).--trigger-image): a second pass through a newer serving build (same model.zip) fillstrigger_frame_index/first_crossing_framefor releases predatingcompute_trigger, merged only when it reproduces the pinned score (91/91 agreed).viewer/ (shared, additive)
started/organizationsortable columns, org filter cascading into the camera filter, MonitorCards rail (day span, kept/discarded counts + %, clickable per-org breakdown, replay explainer), replay diagnostics in the detail pane, source picker hidden for single-source data.Repo wiring
Root Makefile
PACKAGES, CI matrix, README table,.envrcconvention (+ gitignore), DVC remote ats3://pyro-vision-rd/dvc/temporal-model/monitor/(store + reports pushed for credential-lessdvc pullconsumption).Test plan
monitor: 66 tests (offline: mocked HTTP, fake docker stack), ruff check + formatviewer: 39 vitest tests, eslint, prettier, tscmake lintgreenDATA_ROOT=../monitor npm run devpyronear-sdis-07_brison_226_2023-07-14T12-40-59exists under both train labels — follow-up for data curation, untouched here)Try it (load the predictions, ~2 min)
No alert-api credentials and no Docker needed — the store and the replayed
predictions are on the DVC remote (same S3 access as the train/eval remotes):
You'll see the June 12 production day: 91 scored sequences across 9
organizations (sdis-07, sdis-40, sdis-77, sdis-12, ...), each row carrying
production's verdict and score. Click a row to see the replayed tubes, YOLO
boxes, stabilized crops and trigger frame; the detail pane shows the replay
diagnostics (
replay prob, match status, the window production scored). Theleft rail has the kept/discarded stats and a clickable per-org breakdown.
To import fresh sequences yourself (alert-api credentials required) see
monitor/README.md—cp .envrc.example .envrc, thenmake import ARGS="--all-orgs --exclude-org pyroadmins"anduv run dvc repro(Docker required for the replay).
Follow-ups (pyro-api side, discussed)
/sequences/all/fromdatewould replace the id-scan import