feat(api): accept caller-supplied per-frame detections on /predict#47
Open
Chouffe wants to merge 13 commits into
Open
feat(api): accept caller-supplied per-frame detections on /predict#47Chouffe wants to merge 13 commits into
Chouffe wants to merge 13 commits into
Conversation
…d-bboxes # Conflicts: # api/README.md # api/src/temporal_model/api/app.py # api/src/temporal_model/api/model_runner.py # api/src/temporal_model/api/schemas.py # api/tests/test_app.py # api/tests/test_model_runner.py # api/tests/test_schemas.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/predict; today the API re-runs its bundled YOLO over every frame. This adds an optionaldetectionsfield toPOST /predictso callers can supply those boxes and skip the in-API detector pass (~600 ms/request on CPU).detectionsis present, the bundled detector and its cache are bypassed entirely (no read, no write). The supplied boxes are converted to internal xywhnDetections and fed through the existingpredict(frame_detections=...)injection seam. Tube building, ROI filtering, cropping, classification, and calibration run unchanged — the calibrator sees genuine per-tubemean_conf,log_len, andn_tubesfrom the real per-frame boxes. Core is untouched.docs/specs/2026-06-11-api-supplied-detections-design.md. Documented (unvalidated) risk: calibration was fit on bundled-detector boxes; edge-detector boxes may shift it — to be validated at alert-api integration time.The two paths
flowchart LR A["POST /predict"] --> B{"detections<br/>in request?"} B -- "no (today's behavior)" --> C["detection cache<br/>(read + write)"] C --> D["bundled YOLO<br/>on cache misses"] D --> E["tube building"] B -- "yes" --> F["convert xyxyn → xywhn<br/>(class_id=0, cache bypassed)"] F --> E E --> G["ROI filter<br/>(roi_xyxyn, optional)"] G --> H["crop + stabilize"] H --> I["ViT classifier"] I --> J["calibrator"] J --> K["{ is_smoke, probability }"]Intended deployment flow:
sequenceDiagram participant RPi as RPi (pyro-engine) participant P as alert-api participant API as temporal-model API RPi->>P: alert + per-frame bboxes (xyxyn + conf) P->>API: POST /predict { frames, detections } Note over API: YOLO skipped — tubes built<br/>from the supplied boxes API-->>P: { is_smoke, probability }Request format
Today's call — detector path (unchanged)
Omit
detections(or sendnull) and the API behaves exactly as before: frames are fetched from S3 and the bundled YOLO runs on every frame (with the detection cache):(
bucketandroi_xyxynoptional, as before.) With?verbose=true, the response reports"detections_source": "detector"and thedetectorstage shows up in profiling.New call — caller-supplied detections (detector bypassed)
One entry per frame, index-aligned with
frames. An explicit[]means "the detector ran on this frame and saw nothing" (becomes a gap for tube building);nullentries and partial coverage are rejected.detectionscomposes withbucketandroi_xyxyn; omitting it (or sendingnull) gives exactly today's behavior. The response shape is unchanged:{ "is_smoke": true, "probability": 0.952, "model": { "name": "vit_dinov2_finetune", "version": "0.1.0" } }With
?verbose=true, thedetailsblock now carries provenance —details.preprocessing.detections_sourceis"request"when the boxes came from the caller,"detector"when the bundled YOLO produced them.Validation (
400 invalid_request): length mismatch withframes,null/non-list entries, coords outside [0, 1], inverted or zero-area boxes (also catches accidental xywhn input fail-closed), confidence outside [0, 1], missing fields.Relationship to #46
#46 explores the same goal (skip the in-API detector) with a simpler contract: one static
bbox_xyxyn+ onebbox_confidencestamped on every frame. That shape loses exactly the information the downstream stages need:[logit, log_len, mean_conf, n_tubes](core/logistic_calibrator.py:106). A forced single box pinsmean_confto one constant (defaulting to 1.0, far above real YOLO confidence distributions),log_lento the full sequence length, andn_tubesto 1 — so the returned probability comes from feature values the regressor never saw during fitting.This PR keeps the per-frame boxes and confidences instead, and the exact-equivalence check below shows that shape preserves the model's behavior bit-for-bit. The plumbing from #46 (injection seam, cache bypass, validation reuse) follows the same approach here.
Test Plan
make -C api lint && make -C api test— 150 passed, 1 skippedmake -C core test— 245 passed, core has zero diffsBboxTubeTemporalModel.detect(), converted its boxes toxyxyn, and sent them asdetections— the response is identical to the in-API detector path (probability equal to all 16 digits, full verbose payload deep-equal; onlydetections_sourceand timings differ)scratch/annot_seq_9711(7 real frames):is_smoke=true, p=0.870, 3 tubes,detectorstage 612 ms,detections_source="detector"is_smoke=true, p=0.952, 2 tubes, nodetectorstage in profiling,detections_source="request"detections:is_smoke=false, p=0.0, 0 tubes400 invalid_request;detections: nullfalls back to the detector path