Happy-path E2E visual test (full UI flow) + Quantize page + Predict compare#8
Merged
Conversation
Adds one full-browser E2E test that drives the real Streamlit UI through the whole pipeline — Tray capture → crop → Train → Evaluate → Quantize → Predict — on a tiny committed real-crop dataset, recording the run to a .webm. It doubles as the CI gate (new `e2e` job; the fast job now runs `-m "not e2e"`). Product changes the flow needed: - New Quantize/Export page (separate from Train): float ONNX + INT8 with sizes/parity, mode selector (dynamic/static/none). - Predict page: compare float vs INT8 side by side (top-1 agreement + latency). Bugs the E2E surfaced and fixed: - Tray: Process results + Save button were gated on a transient button, so clicking Save reran with Process unclicked and the save never executed. Now the segmentation result is persisted in session_state. - Train: "Done" was keyed on OS process liveness while auto-refresh stopped on the metrics `done` event, leaving the page stuck on "Training…". Now keyed on the `done` event. Harness/fixture: - tests/e2e/build_fixture.py samples a tiny real-crop dataset (committed, ~320K) so CI runs without DVC. - tests/e2e/synth_tray.py composites real crops into an ArUco tray photo. - tests/e2e/harness.py builds an isolated sandbox (ALMENDRA_ROOT + cwd) with fast/offline config tweaks and launches `almendra ui`. - `e2e` extra (playwright), pytest `e2e` marker, Makefile `e2e`/`test` targets. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
One happy-path E2E visual test that launches the real Streamlit UI and drives the whole pipeline end-to-end, recording the run to a
.webm:Tray capture → crop/segment → Train → Evaluate → Quantize/Export → Predict (compare float vs INT8)
It runs on a tiny committed real-crop dataset (≈320 KB), so it needs no DVC, and doubles as the CI gate: a new
e2ejob installs Chromium, runs the test, and uploads the recording as an artifact. The existing fast job now runspytest -m "not e2e".Product changes the flow needed
page_quantize.py), separate from Train as requested: exports float ONNX + INT8, shows sizes / size-reduction / parity, with adynamic | static | nonemode selector.Bugs the E2E surfaced (and fixed)
Processbutton, so clicking Save reran with Process unclicked → early return → the save never executed. Now the result is persisted insession_state.doneevent — a race that left the page stuck. Now keyed on thedoneevent.Harness / fixtures
tests/e2e/build_fixture.py— samples the committed mini-dataset from localdata/processed/(run once locally; reuses canonical taxonomy indices).tests/e2e/synth_tray.py— composites real crops into an ArUco tray photo (segmentation verified 8/8 wells).tests/e2e/harness.py— builds an isolated sandbox (ALMENDRA_ROOT+ cwd) with fast/offline config tweaks (pretrained off, cpu, num_workers 0, dynamic INT8) and launchesalmendra ui.e2eextra (Playwright), pyteste2emarker,make e2e/make testtargets.Verification
make test→ 68 passed (e2e deselected);ruff check .+ format clean.make e2e→ passes in ~28 s locally;.webmwritten undertests/e2e/recordings/(gitignored).🤖 Generated with Claude Code