Skip to content

feat: cluster sequences into groups + propagation on validation#132

Merged
MateoLostanlen merged 22 commits into
mainfrom
feat/sequence-groups
May 11, 2026
Merged

feat: cluster sequences into groups + propagation on validation#132
MateoLostanlen merged 22 commits into
mainfrom
feat/sequence-groups

Conversation

@MateoLostanlen

Copy link
Copy Markdown
Member

Why

Most platform sequences are recurring views of the same camera/azimuth/region. R&D on 857 sequences shows ~46% of annotation clicks can be avoided by clustering and reusing one annotator's labels across the group.

What

  • sequence_groups table keyed on (camera_id, azimuth) with a frozen representative_bbox. At most one label per group (smoke OR false positive, enforced by CHECK). Groups carry an is_validated flag.
  • POST /sequence_groups/assign + make assign-groups: best-IoU match (>0.3) on the camera/azimuth bucket. New sequences joining an already-labeled group inherit its label automatically (upgrade the import.py READY_TO_ANNOTATE placeholder to SEQ_ANNOTATION_DONE). Chained into make pull-sequences.
  • Propagation hook: when a per-sequence annotation reaches SEQ_ANNOTATION_DONE and the seq is in a validated group, the labels fan out to other unlocked members. Refuses to silently flip a conflicting group label.
  • Frontend: new "Sequence groups" entry at the top of the left sidebar. List page filters to unlabeled by default. Group review page lets you remove outliers, validate the group, and click through to the per-sequence annotation flow; each thumbnail overlays the sequence's predicted bbox plus the group's reference region so grouping mistakes are visible.

Out of scope / known follow-ups

  • Bulk-annotate endpoint and propagation commit per-call rather than as a single transaction (idempotent on retry via skip-by-stage).
  • `assign-groups` is single-threaded by contract, no DB lock.
  • All new endpoints use `get_current_user`; RBAC split is a separate PR.
  • Export-time merging of labels for images shared by multiple sequences.

Recurring real-world entity at one camera angle (a persistent fire, a
recurring antenna FP, ...). A group carries at most one label (smoke OR
false positive, never both, enforced by a CHECK constraint), and the
representative_bbox lives on the group itself so the group remains
self-defining if all its members are eventually pruned.

Sequences gain a nullable sequence_group_id FK with ON DELETE SET NULL.
Membership is set by the assign-groups job, not on import.
GET returns the group + its members (lightweight projection with
has_annotation flag for the UI's "already-annotated" hint).

POST /sequence_groups/assign is the single-threaded, idempotent batch
that turns unassigned sequences into group memberships:

- compute representative_bbox from the sequence's first 10 detections
  (median of algo_predictions, ignoring others_bboxes — matches the
  current auto-annotation flow)
- best-IoU match against existing groups for the same (camera_id,
  azimuth), threshold > 0.5 (stricter than within-sequence clustering
  since a wrong match auto-applies inherited labels)
- if no match: create a new group; otherwise join the existing one and,
  if the group already has a label, create a SequenceAnnotation with
  inherited labels in stage SEQ_ANNOTATION_DONE
Apply one label (smoke OR false positive, never both) to many sequences
in a single request. Skips sequences whose annotation is past
SEQ_ANNOTATION_DONE so reviewed work isn't clobbered, and rejects with
409 when the target group already carries a different label unless the
caller passes force=true.

When group_id is provided, the endpoint also writes the label onto the
group itself so future joiners inherit it via assign-groups. Returns
per-sequence applied/skipped status with reasons.
…ences

Thin wrapper that auths against the local annotation API and calls
POST /sequence_groups/assign once. Single-threaded by contract — the
endpoint is the only writer of sequence_groups, so we never have two
ingesters racing to create the same group.

make pull-sequences now runs assign-groups automatically after each
import so newly-pulled sequences are immediately clustered (and
inherit labels from existing groups when applicable).
New route /sequence-groups/:id/annotate. Thumbnail grid + per-member
checkbox + label form. Defaults to selecting every member that doesn't
already have an annotation, so the annotator just unchecks outliers and
applies. The label form is a radio (smoke vs false positive) plus a
single dropdown to pick the enum value, mirroring the API constraint
that exactly one of the two is set.

Submit posts to /annotations/sequences/bulk with the explicit
sequence_ids list and group_id, then surfaces the per-sequence
applied/skipped status returned by the backend. Conflict overwrite is
gated on a "Overwrite group's existing label" checkbox surfaced only
when the group is already labeled.
- assign-groups creates a new group from an unmatched sequence
- bulk-annotate writes the label both onto the sequences and onto the
  group (so future joiners inherit it)
- bulk-annotate rejects a conflicting label on an already-labeled group
  with 409, accepts it with force=true
- request validation rejects payloads with neither or both labels
- migration header: align Revises comment with the actual down_revision
  (a1b2c3d4e5f6, the others_bboxes migration)
- shorten the XOR check constraint to its equivalent positive form
  (smoke_type IS NULL OR false_positive_type IS NULL)
- delete the unused SequenceGroupLabelUpdate schema and the unused
  module-level logger
- merge the two duplicated label-rewriting helpers into
  apply_label_to_sequences_bbox in services/annotation_generation.py
- add UNDER_ANNOTATION to the bulk-annotate locked stages so active
  human work isn't clobbered
- only write the label onto the group when at least one sequence was
  actually applied; otherwise the group would carry a label that
  never reached any current member
- frontend bulk-error display reads the API's detail field with the
  Error.message fallback (axios rejects with { detail })
- test docstring: remove the unsupported claim about cross-sequence
  inheritance coverage (tested end-to-end via the make pipeline)
- new GET /sequence_groups/ paginated endpoint with member_count and
  ?labeled=true|false filter, ordered by created_at desc
- GET /sequence_groups/{id} now returns each member's first detection id
  and its algo_predictions, so the UI can render a thumbnail with bbox
  overlays without an extra round-trip per member
- new "Sequence groups" entry in the left sidebar pointing at
  /sequence-groups (the index page) so groups are discoverable from the
  navigation
- new SequenceGroupsListPage: paginated table of all groups with
  members count, label state, and a filter (all / labeled / unlabeled)
- SequenceGroupAnnotatePage now overlays two bboxes on each thumbnail:
  - the sequence's own tracked predictions (red, solid)
  - the group's reference region (yellow, dashed) — same on every
    thumbnail so the annotator can eyeball whether each member really
    overlaps the group
- thumbnails consume the new first_detection_id + algo_predictions
  inlined in the group response, removing the previous N+1 query
The list query returns rows containing the JSONB representative_bbox
(a dict), which is not hashable. fastapi-pagination then refuses to
deduplicate the rows and raises NonHashableRowsException — surfaced
to the UI as 'Failed to load groups'. Disabling row deduplication is
safe here since we already group by SequenceGroup.id.
Pivot the group review UX: the per-group page no longer applies labels
itself. The annotator clicks through to the regular per-sequence
annotation page; if the group has been marked "validated", the labels
they save there fan out to every other unannotated member of the group.

- new sequence_groups.is_validated column (separate migration)
- new PATCH /sequence_groups/{id} to flip is_validated
- new DELETE /sequence_groups/{group_id}/members/{seq_id} to remove an
  outlier from a group without deleting the sequence
- propagation hook in POST/PATCH /annotations/sequences: when an
  annotation reaches SEQ_ANNOTATION_DONE and the seq belongs to a
  validated group, derive a single label from the annotation
  (most-common smoke type or fp type), copy it onto the group, and
  re-generate matching annotations for the other members
- skip locked stages (UNDER_ANNOTATION, SEQ_ANNOTATION_DONE,
  IN_REVIEW, NEEDS_MANUAL, ANNOTATED) so manual work isn't clobbered
- frontend: drop the label form on /sequence-groups/:id/annotate;
  thumbnails are now clickable links to /sequences/:id/annotate, each
  has an X to remove that member, and the header gets a
  Validate / Unvalidate toggle. List page surfaces is_validated.
…placeholder

import.py's Step 3 creates an empty SequenceAnnotation in stage
READY_TO_ANNOTATE for every imported sequence. assign-groups previously
treated 'annotation already exists' as 'skip', so a fresh sequence that
joined a labeled group never picked up the label automatically.

Now the inheritance path checks the existing annotation's stage:
- READY_TO_ANNOTATE (the placeholder) -> update it in place with the
  inherited labels and bump to SEQ_ANNOTATION_DONE
- anything past that (UNDER_ANNOTATION, SEQ_ANNOTATION_DONE+) -> skip;
  the human / review pipeline owns it
R&D predicted 0.5 was too strict and would miss most natural smoke
drift. Live data confirms it: at 0.5 only 14% of new sequences joined
an existing group; at 0.3 the rate is 30% (matching the R&D estimate
of ~46% click savings).
The /sequence_groups/ list page is for finding groups worth bulk-
annotating; size-1 groups can't benefit from validation + propagation.
Filter them out at the SQL level (HAVING count >= 2 + inner-join on
the member-count subquery) so they don't pollute the page.

The single-group GET endpoint and the assign-groups job are unaffected;
singleton groups still exist in the DB and a future joiner can promote
them to a multi-member group.
- assign-groups docstring: clarify that every unassigned sequence is
  assigned to a group (not just unannotated ones); only the inheritance
  step is gated on annotation stage
- frontend thumbnail: switch from object-cover to object-contain so the
  bbox overlays don't get pushed off the visible image when an image's
  aspect ratio differs from 16:9
- drop the unused bulkAnnotateSequences client method, the
  Bulk* TS types, and the SEQUENCE_ANNOTATIONS_BULK constant — the
  per-sequence propagation path replaced this surface for the UI;
  the backend endpoint stays as a programmatic primitive
- list page caption: replace the misleading 'bulk-annotate' wording
  with the actual flow (annotate one member → propagation if validated)
- update model.py docstring to reference the current IoU > 0.3
  threshold instead of the previous 0.5
…bnails

- Move 'Sequence groups' to the top of the left sidebar (above
  Sequences/Detections) since it's now the primary workflow entry
- Default the groups list filter to 'unlabeled' — labeled groups are
  done work, the point of opening this page is to find the next thing
  to validate + annotate
- Increase thumbnail size: 1/2/3 columns instead of 2/3/5; thumbnails
  now occupy enough area to actually see what's in the image and judge
  whether the bbox overlay matches
- bulk-annotate: rewrite the 409 conflict message to spell out that
  force=true only overwrites the group's label and re-propagation to
  unlocked members, not annotations past SEQ_ANNOTATION_DONE
- group propagation: refuse to silently flip an existing group label
  when a member's annotation implies a different one; log a warning and
  leave the group alone instead (the per-seq annotation still saves)
- GET /sequence_groups/{id}: replace the has_annotation boolean by the
  raw annotation_processing_stage so the UI can distinguish import.py's
  READY_TO_ANNOTATE placeholder from real human work
- first-detection subquery: switch to row_number() with a deterministic
  tie-breaker so members can't duplicate on equal recorded_at
- dedupe _bbox_iou by reusing services.annotation_generation.box_iou
- move the previously inline SequenceAnnotationUpdate import to the top
  of sequence_groups.py
- frontend: revert thumbnails to object-cover (pyro images are 16:9,
  cover matches container exactly) and route the annotated indicator
  through the new processing-stage signal so READY_TO_ANNOTATE no
  longer shows as "annotated"
- 'remove from group' now sticks: new Sequence.is_group_excluded
  boolean (migration d4e5f6a7b8c9). DELETE /sequence_groups/{id}/members
  sets it; assign_groups filters it out so the next import doesn't
  silently re-attach a sequence the annotator pruned.
- Group-propagation conflicts surface to the caller: when fan-out is
  skipped because the group already carries a different label, the
  per-sequence annotation response now includes group_propagation_warning
  with a human-readable reason. UI can show a toast; the annotation
  itself still saves.
- Drop the misleading 'in a single transaction' phrasing on the bulk
  endpoint summary. Wording is now 'per-sequence commits' so callers
  know retries are idempotent but the loop is not atomic.
Backend already returned the warning on /annotations/sequences POST + PATCH
since the previous fix; the React save handler was ignoring the mutation
result and always showed 'Annotation saved successfully'. Annotators in a
validated group with a conflicting label would never see that propagation
was skipped.

- Add group_propagation_warning to the SequenceAnnotation TypeScript type
- In AnnotationInterface's update mutation onSuccess, when the saved
  annotation carries a non-null group_propagation_warning, show an info
  toast with the backend message in addition to the success toast
The previous attempt called showToastNotification twice and let the
1-second auto-advance run anyway, so:
  - useToastNotifications holds only one message — the second call
    overwrote the first.
  - The setTimeout either replaced the warning with a 'Moving to next'
    info toast or navigated away from the page entirely.

Now: when the backend returns group_propagation_warning we render a
sticky amber banner at the top of the page and skip the auto-advance.
The banner carries the backend message + a 'Open group' link
(/sequence-groups/<id>/annotate) when the sequence belongs to a group,
plus a Dismiss button. The annotator can reconcile the conflict before
moving on.
- Reset the sticky banner whenever sequenceId changes so it doesn't
  bleed from one sequence onto the next, and clear it on a subsequent
  non-warning save so a resolved conflict disappears immediately.
- Move the banner from a fixed top-0 z-50 element (which overlaid the
  fixed AnnotationHeader) to a sticky top-20 z-30 element inside the
  body content. It now sits just below the header and follows the
  user as they scroll, without obscuring header controls.
- Add Sequence.sequence_group_id to the frontend TypeScript type so
  the 'Open group' link no longer needs an unsafe in-keyword cast.
- Swap the raw <a href> for react-router-dom Link so navigating away
  from a conflict doesn't trigger a full app reload.
Higher priority:
- Test the propagation hook explicitly. Four new cases cover:
  unvalidated group (no-op), validated + no conflict (group label set
  and fan-out happens), validated + conflict (warning returned, group
  untouched, no fan-out), and validated + locked member (reviewed
  member is left alone).
- Add a recovery path for manual exclusion: new endpoint
  POST /sequence_groups/members/{sequence_id}/re-include clears
  is_group_excluded so an accidentally-removed sequence can be put
  back into the pool. The remove confirm dialog now also warns that
  the exclusion is sticky.

Smaller:
- Make the inconsistent bare return in _propagate_to_group_if_validated
  an explicit return None, matching the function's annotated return
  type.
- Clamp confidence in the computed representative_bbox to [0, 1] so a
  malformed detection can't make a group fail RepresentativeBbox
  validation on the next read.
- Cross-reference the locked-stage set between
  ANNOTATED_STAGES (frontend) and _BULK_LOCKED_STAGES (backend) so
  future changes touch both lists together.
- Make the empty-state copy on the groups list page actually useful
  for the default 'unlabeled' filter — mentions that singletons are
  hidden by design and points at make assign-groups.
@MateoLostanlen MateoLostanlen merged commit a0dc40b into main May 11, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant