Skip to content

Happy-path E2E visual test (full UI flow) + Quantize page + Predict compare#8

Merged
mrjunos merged 1 commit into
mainfrom
e2e-visual-test
May 26, 2026
Merged

Happy-path E2E visual test (full UI flow) + Quantize page + Predict compare#8
mrjunos merged 1 commit into
mainfrom
e2e-visual-test

Conversation

@mrjunos

@mrjunos mrjunos commented May 25, 2026

Copy link
Copy Markdown
Owner

What

One happy-path E2E visual test that launches the real Streamlit UI and drives the whole pipeline end-to-end, recording the run to a .webm:

Tray capture → crop/segment → Train → Evaluate → Quantize/Export → Predict (compare float vs INT8)

It runs on a tiny committed real-crop dataset (≈320 KB), so it needs no DVC, and doubles as the CI gate: a new e2e job installs Chromium, runs the test, and uploads the recording as an artifact. The existing fast job now runs pytest -m "not e2e".

Product changes the flow needed

  • New Quantize/Export page (page_quantize.py), separate from Train as requested: exports float ONNX + INT8, shows sizes / size-reduction / parity, with a dynamic | static | none mode selector.
  • Predict compare mode: run a model's float and INT8 ONNX side by side on the same bean (top-1 agreement + per-model latency).

Bugs the E2E surfaced (and fixed)

  • Tray Save never worked: segmentation results and the Save button were gated behind a transient Process button, so clicking Save reran with Process unclicked → early return → the save never executed. Now the result is persisted in session_state.
  • Train could hang on "Training…": the "Done" status was keyed on OS-process liveness while auto-refresh stopped on the metrics done event — a race that left the page stuck. Now keyed on the done event.

Harness / fixtures

  • tests/e2e/build_fixture.py — samples the committed mini-dataset from local data/processed/ (run once locally; reuses canonical taxonomy indices).
  • tests/e2e/synth_tray.py — composites real crops into an ArUco tray photo (segmentation verified 8/8 wells).
  • tests/e2e/harness.py — builds an isolated sandbox (ALMENDRA_ROOT + cwd) with fast/offline config tweaks (pretrained off, cpu, num_workers 0, dynamic INT8) and launches almendra ui.
  • e2e extra (Playwright), pytest e2e marker, make e2e / make test targets.

Verification

  • make test → 68 passed (e2e deselected); ruff check . + format clean.
  • make e2e → passes in ~28 s locally; .webm written under tests/e2e/recordings/ (gitignored).
  • Quantize mode: the gate drives dynamic INT8 (reliable, calibration-free → deterministic green gate); the page still offers static for real use.

🤖 Generated with Claude Code

Adds one full-browser E2E test that drives the real Streamlit UI through the
whole pipeline — Tray capture → crop → Train → Evaluate → Quantize → Predict —
on a tiny committed real-crop dataset, recording the run to a .webm. It doubles
as the CI gate (new `e2e` job; the fast job now runs `-m "not e2e"`).

Product changes the flow needed:
- New Quantize/Export page (separate from Train): float ONNX + INT8 with
  sizes/parity, mode selector (dynamic/static/none).
- Predict page: compare float vs INT8 side by side (top-1 agreement + latency).

Bugs the E2E surfaced and fixed:
- Tray: Process results + Save button were gated on a transient button, so
  clicking Save reran with Process unclicked and the save never executed. Now
  the segmentation result is persisted in session_state.
- Train: "Done" was keyed on OS process liveness while auto-refresh stopped on
  the metrics `done` event, leaving the page stuck on "Training…". Now keyed on
  the `done` event.

Harness/fixture:
- tests/e2e/build_fixture.py samples a tiny real-crop dataset (committed, ~320K)
  so CI runs without DVC.
- tests/e2e/synth_tray.py composites real crops into an ArUco tray photo.
- tests/e2e/harness.py builds an isolated sandbox (ALMENDRA_ROOT + cwd) with
  fast/offline config tweaks and launches `almendra ui`.
- `e2e` extra (playwright), pytest `e2e` marker, Makefile `e2e`/`test` targets.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@mrjunos mrjunos merged commit deaa4d1 into main May 26, 2026
2 checks passed
@mrjunos mrjunos deleted the e2e-visual-test branch May 26, 2026 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant