Happy-path E2E visual test (full UI flow) + Quantize page + Predict compare by mrjunos · Pull Request #8 · mrjunos/almendra

mrjunos · 2026-05-25T22:35:50Z

What

One happy-path E2E visual test that launches the real Streamlit UI and drives the whole pipeline end-to-end, recording the run to a .webm:

Tray capture → crop/segment → Train → Evaluate → Quantize/Export → Predict (compare float vs INT8)

It runs on a tiny committed real-crop dataset (≈320 KB), so it needs no DVC, and doubles as the CI gate: a new e2e job installs Chromium, runs the test, and uploads the recording as an artifact. The existing fast job now runs pytest -m "not e2e".

Product changes the flow needed

New Quantize/Export page (page_quantize.py), separate from Train as requested: exports float ONNX + INT8, shows sizes / size-reduction / parity, with a dynamic | static | none mode selector.
Predict compare mode: run a model's float and INT8 ONNX side by side on the same bean (top-1 agreement + per-model latency).

Bugs the E2E surfaced (and fixed)

Tray Save never worked: segmentation results and the Save button were gated behind a transient Process button, so clicking Save reran with Process unclicked → early return → the save never executed. Now the result is persisted in session_state.
Train could hang on "Training…": the "Done" status was keyed on OS-process liveness while auto-refresh stopped on the metrics done event — a race that left the page stuck. Now keyed on the done event.

Harness / fixtures

tests/e2e/build_fixture.py — samples the committed mini-dataset from local data/processed/ (run once locally; reuses canonical taxonomy indices).
tests/e2e/synth_tray.py — composites real crops into an ArUco tray photo (segmentation verified 8/8 wells).
tests/e2e/harness.py — builds an isolated sandbox (ALMENDRA_ROOT + cwd) with fast/offline config tweaks (pretrained off, cpu, num_workers 0, dynamic INT8) and launches almendra ui.
e2e extra (Playwright), pytest e2e marker, make e2e / make test targets.

Verification

make test → 68 passed (e2e deselected); ruff check . + format clean.
make e2e → passes in ~28 s locally; .webm written under tests/e2e/recordings/ (gitignored).
Quantize mode: the gate drives dynamic INT8 (reliable, calibration-free → deterministic green gate); the page still offers static for real use.

🤖 Generated with Claude Code

Adds one full-browser E2E test that drives the real Streamlit UI through the whole pipeline — Tray capture → crop → Train → Evaluate → Quantize → Predict — on a tiny committed real-crop dataset, recording the run to a .webm. It doubles as the CI gate (new `e2e` job; the fast job now runs `-m "not e2e"`). Product changes the flow needed: - New Quantize/Export page (separate from Train): float ONNX + INT8 with sizes/parity, mode selector (dynamic/static/none). - Predict page: compare float vs INT8 side by side (top-1 agreement + latency). Bugs the E2E surfaced and fixed: - Tray: Process results + Save button were gated on a transient button, so clicking Save reran with Process unclicked and the save never executed. Now the segmentation result is persisted in session_state. - Train: "Done" was keyed on OS process liveness while auto-refresh stopped on the metrics `done` event, leaving the page stuck on "Training…". Now keyed on the `done` event. Harness/fixture: - tests/e2e/build_fixture.py samples a tiny real-crop dataset (committed, ~320K) so CI runs without DVC. - tests/e2e/synth_tray.py composites real crops into an ArUco tray photo. - tests/e2e/harness.py builds an isolated sandbox (ALMENDRA_ROOT + cwd) with fast/offline config tweaks and launches `almendra ui`. - `e2e` extra (playwright), pytest `e2e` marker, Makefile `e2e`/`test` targets. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

mrjunos merged commit deaa4d1 into main May 26, 2026
2 checks passed

mrjunos deleted the e2e-visual-test branch May 26, 2026 13:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Happy-path E2E visual test (full UI flow) + Quantize page + Predict compare#8

Happy-path E2E visual test (full UI flow) + Quantize page + Predict compare#8
mrjunos merged 1 commit into
mainfrom
e2e-visual-test

mrjunos commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mrjunos commented May 25, 2026

What

Product changes the flow needed

Bugs the E2E surfaced (and fixed)

Harness / fixtures

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant