mrjunos · mrjunos · May 20, 2026 · May 20, 2026 · May 20, 2026 · May 20, 2026
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -14,11 +14,12 @@ jobs:
       - name: Install uv
         uses: astral-sh/setup-uv@v6
 
-      # core + dev + train + export — installs torch/sklearn/onnx so the model,
-      # metrics and pipeline modules are exercised. The `data` extra (roboflow,
-      # dvc) is omitted: tests do not touch dataset downloads.
-      - name: Install (core + dev + train + export + capture)
-        run: uv sync --extra dev --extra train --extra export --extra capture
+      # core + dev + train + export + capture + ui — installs torch/sklearn/onnx
+      # so the model, metrics and pipeline modules are exercised, plus streamlit
+      # so the UI smoke tests actually run. The `data` extra (roboflow, dvc) is
+      # omitted: tests do not touch dataset downloads.
+      - name: Install (core + dev + train + export + capture + ui)
+        run: uv sync --extra dev --extra train --extra export --extra capture --extra ui
 
       - name: Ruff lint
         run: uv run ruff check .

diff --git a/Makefile b/Makefile
@@ -3,7 +3,7 @@
 # created by uv. Run `make help` for the full list.
 
 .DEFAULT_GOAL := help
-.PHONY: help setup setup-min lint format test info train eval export bench sweep data ingest clean
+.PHONY: help setup setup-min lint format test info train eval export bench sweep data ingest ui clean
 
 SWEEP_BACKBONES ?= mobilenet_v3_small,mobilenet_v3_large,efficientnet_b0
 SWEEP_EPOCHS ?= 20
@@ -53,6 +53,9 @@ bench: ## Benchmark inference latency / throughput
 sweep: ## Backbone sweep (override SWEEP_BACKBONES, SWEEP_EPOCHS)
 	uv run almendra sweep --backbones $(SWEEP_BACKBONES) --epochs $(SWEEP_EPOCHS) $(ARGS)
 
+ui: ## Launch the local Streamlit UI (Phase 6)
+	uv run almendra ui $(ARGS)
+
 clean: ## Remove build artifacts, caches and run outputs
 	rm -rf outputs mlruns .pytest_cache .ruff_cache dist build
 	find . -type d -name __pycache__ -exec rm -rf {} +
diff --git a/README.md b/README.md
@@ -11,11 +11,12 @@ hardware-agnostic export/benchmark toolchain, and a documented physical capture
 protocol. The model is the focus — reliable and fast — but it must stay easy to
 re-train as better data arrives.
 
-> **Status: Phase 2 — multi-view model.** The full pipeline (`ingest → train →
-> eval → export → bench`) runs on public data — single-view baseline **0.92 test
-> macro-F1** — and the multi-view model is trained and shown to be view-count
-> robust. See [`docs/research-log.md`](docs/research-log.md) for live progress
-> and [the roadmap](#roadmap) below.
+> **Status: Phase 6 — local UI.** The full pipeline (`ingest → train → eval →
+> export → bench`) runs on public data; Phase 5's Pareto sweep picked
+> **MobileNetV3-Large + static INT8** as the current deploy choice (0.86 macro-F1,
+> 3.6 MB, ~430 beans/s on a single CPU thread); and a local Streamlit UI now
+> wraps the whole toolkit. See [`docs/research-log.md`](docs/research-log.md)
+> for the full log.
 
 ## The idea
 
@@ -59,6 +60,93 @@ make export              # export to ONNX (+ INT8) with a parity check
 make bench               # benchmark inference latency
 ```
 
+## Local UI
+
+A Streamlit app wraps the whole pipeline behind a bilingual ES/EN interface —
+tray capture, training (with live charts), evaluation, prediction and settings.
+A non-technical user can run almendra end-to-end without touching the CLI.
+
+![almendra Home page in Spanish](docs/images/ui-home.png)
+
+### Launch
+
+```bash
+uv sync --extra ui --extra train --extra export --extra capture
+make ui
+# equivalent: uv run almendra ui
+```
+
+The app opens at <http://localhost:8501>. Flags:
+
+```bash
+uv run almendra ui --port 8888       # use a different port
+uv run almendra ui --headless        # don't auto-open a browser (SSH / CI)
+```
+
+| Extra installed | Page it unlocks |
+|---|---|
+| `ui` | the app itself (Streamlit + Plotly) |
+| `train` | Train + Evaluate + mis-classified gallery (PyTorch) |
+| `export` | Predict (ONNX Runtime) |
+| `capture` | Tray Capture (OpenCV) |
+
+Skipping an extra is fine — the page that depends on it shows a clear error
+instead of crashing. Install later and reload.
+
+### What's in the app
+
+1. **🏠 Inicio / Home** — dataset stats, recent runs, a health panel, and an
+   inline wizard that walks first-time users through Ingest → Train → Eval.
+2. **📷 Bandeja / Tray Capture** — drag-and-drop tray photos, see the original
+   next to the rectified+overlay preview, save per-bean crops to
+   `data/raw/proprietary_tray/sessions/<id>/`.
+3. **🧠 Entrenar / Train** — pick a backbone and the key knobs (advanced
+   controls live behind an expander), launch training as a subprocess, and
+   watch `train_loss` + `val_macro_f1` update **in real time** as each epoch
+   completes.
+
+   ![Train page](docs/images/ui-train.png)
+
+4. **📊 Evaluar / Evaluate** — pick a checkpoint and split, run it, see
+   accuracy / macro-F1 / missed-defect-rate, per-class breakdown, confusion
+   matrix heatmap, and a gallery of mis-classified beans.
+5. **🚀 Predecir / Predict** — upload a single-bean photo, get the predicted
+   class, confidence, Top-3, and an accept/reject verdict from the canonical
+   taxonomy. Uses the most recent ONNX for speed (prefers INT8).
+6. **⚙️ Ajustes / Settings** — browse the canonical taxonomy, the YAML data
+   sources, and the current Hydra config.
+
+### End-to-end test in 5 minutes
+
+```bash
+# 1. install everything the UI exercises
+uv sync --extra ui --extra train --extra export --extra capture
+
+# 2. (optional) ingest the public Robusta baseline so Train/Evaluate have data
+export ROBOFLOW_API_KEY=...    # see your Roboflow workspace
+make data && make ingest
+
+# 3. launch the UI
+make ui
+```
+
+Then in the browser:
+
+1. **Inicio** — confirm the health panel shows Python/PyTorch/Taxonomy green;
+   the manifest icon flips to ✅ once `data/processed/manifest.jsonl` exists.
+2. **Entrenar** — backbone `mobilenet_v3_small`, **3 épocas** (for a smoke
+   test), **Iniciar entrenamiento**. The Plotly chart should start updating
+   within a couple of seconds of the first epoch landing.
+3. **Evaluar** — pick the run you just trained, leave `split = test`,
+   **Ejecutar**. You get the headline metrics + confusion matrix + error
+   gallery.
+4. **Predecir** — from a terminal, `uv run almendra export --checkpoint
+   outputs/ui-<timestamp>/best.pt`. Refresh the Predict page, pick the ONNX
+   from the dropdown, upload any single-bean image.
+
+See [`docs/ui.md`](docs/ui.md) for the deeper troubleshooting guide
+(stuck subprocesses, port conflicts, missing extras).
+
 ## Repository layout
 
 | Path | Purpose |
@@ -87,10 +175,10 @@ answer, tracked in [`docs/research-log.md`](docs/research-log.md):
 - **Phase 0** — Scaffolding ✓
 - **Phase 1** — Data pipeline + single-view public baseline ✓
 - **Phase 2** — Multi-view fusion model ✓
-- **Phase 3** — Physical capture protocol + proprietary Arabica data *(current)*
+- **Phase 3** — Physical capture protocol + proprietary Arabica data *(blocked on data)*
 - **Phase 4** — Multi-spectral illumination (UV, transillumination)
-- **Phase 5** — Speed: backbone sweep, INT8, hardware benchmark
-- **Phase 6** — Deployment reference + sorting-machine spec
+- **Phase 5** — Speed: backbone sweep, INT8, hardware benchmark ✓
+- **Phase 6** — Local Streamlit UI for the whole toolkit ✓
 - *Parallel research track* — NIR / hyperspectral internal-defect inspection
 
 ## Data & licensing

diff --git a/docs/images/ui-home.png b/docs/images/ui-home.png
diff --git a/docs/images/ui-train.png b/docs/images/ui-train.png
diff --git a/docs/research-log.md b/docs/research-log.md
@@ -45,6 +45,39 @@ deployment tooling). Headline findings:
 
 ## Log
 
+### 2026-05-21 — Phase 6: local Streamlit UI
+
+- **New optional-deps group `ui`** (`streamlit`, `plotly`) and `almendra ui`
+  CLI subcommand that exec's `streamlit run` on `src/almendra/ui/app.py`. `make
+  ui` is the matching shortcut.
+- **Six pages** behind a sidebar radio: Home (dataset stats + recent runs +
+  inline wizard), Tray Capture, Train, Evaluate, Predict, Settings.
+- **Bilingual ES/EN from day one** — every visible string lives in a central
+  dict (`ui/components/i18n.py`) and pages read it through `t("key", lang)`. A
+  sidebar radio toggles the language; adding a third language is a translation
+  job, not a rewrite. Default: Spanish.
+- **Live training charts** — `train.loop` writes one JSONL line per epoch to
+  `outputs/<run>/live_metrics.jsonl` (controlled by env var
+  `ALMENDRA_LIVE_METRICS` so the CLI use case is untouched). The Train page
+  launches training as a subprocess, polls the file every ~2 s, and re-renders
+  a two-line Plotly chart (train_loss + val_macro_f1).
+- **Decoupled, file-based contract** — the UI is stateless across reruns;
+  everything it shows (runs, checkpoints, ONNX, metrics) is discovered from
+  disk under `outputs/`. Anything that writes the same JSONL schema works with
+  the UI.
+- **Inline wizard** on Home with three "press to go" buttons that walk
+  Ingest → Train → Eval with sensible defaults. Advanced controls (gated
+  fusion, view-dropout, augmentation toggles) live behind an `Advanced`
+  expander on the Train page so they don't intimidate first-time users.
+- **Tests** — `streamlit.testing.v1.AppTest` smoke-tests render every page in
+  ES *and* EN (12 cases) without exceptions; the i18n dict is checked for
+  complete coverage; the live-metrics JSONL writer/reader has its own unit
+  test. All 45 tests in the suites that don't require torch/onnxruntime pass.
+- **Scope split** — Phase 6.0 (this PR) ships the six pages, the CLI/Make
+  entrypoints and tests. Phase 6.1 will add: a dedicated **Labelling** page
+  with hotkeys + IAA reporting, an **Export & Bench** page, a model-package
+  zip exporter, and a "Demo mode" using the public Roboflow data.
+
 ### 2026-05-20 — Phase 5: backbone sweep + static INT8 PTQ
 
 - Static INT8 PTQ implemented (`quantize_int8_static` in `src/almendra/export/exporter.py`):

diff --git a/docs/ui.md b/docs/ui.md
@@ -0,0 +1,134 @@
+# Local UI (Phase 6)
+
+The `almendra ui` command launches a local Streamlit app that wraps the whole
+toolkit — tray capture, training, evaluation, prediction and settings — so a
+non-technical user can run the pipeline end-to-end without touching the CLI.
+
+The UI is **bilingual ES/EN** with a sidebar toggle (Spanish default).
+
+## 1 — Install
+
+```bash
+# Minimum extras for a full end-to-end run:
+uv sync --extra ui --extra train --extra export --extra capture
+```
+
+| Extra | Why you need it |
+|---|---|
+| `ui` | Streamlit + Plotly (the app itself) |
+| `train` | PyTorch + torchvision (Train, Evaluate, mis-classified gallery) |
+| `export` | ONNX Runtime (Predict page) |
+| `capture` | OpenCV (Tray Capture page) |
+
+If you skip an extra, the page that depends on it shows a clear error instead
+of crashing. You can come back and install it later.
+
+## 2 — Launch
+
+```bash
+make ui
+# equivalent: uv run almendra ui
+```
+
+This exec's `streamlit run` against `src/almendra/ui/app.py`. By default the
+app opens at <http://localhost:8501> and your browser auto-opens.
+
+Flags:
+
+```bash
+uv run almendra ui --port 8888          # use a different port
+uv run almendra ui --headless           # don't auto-open a browser (SSH)
+```
+
+The first launch may take a few seconds — Streamlit warms up its caches.
+
+## 3 — End-to-end test flow
+
+The pages are designed to be exercised in order. The **inline wizard on Home**
+gives you a fast-path button for each step.
+
+### 3a — Have public data already?
+
+If you've already run `make data && make ingest` (Roboflow Robusta dataset),
+the manifest at `data/processed/manifest.jsonl` will show up on Home and you
+can skip straight to **Train**.
+
+### 3b — Cold start (proprietary tray photos)
+
+1. **🏠 Home** — confirm the health panel says Python/PyTorch/Taxonomy are all
+   green. Manifest will show ❌ if you haven't ingested anything yet — fine.
+2. **📷 Tray Capture** — drag in *Side A* (required) and *Side B* (optional)
+   tray photos. Set:
+   - **Rows / Cols** — wells in your tray.
+   - **Flip** — `mirror_cols` if you flipped the tray horizontally, `mirror_rows`
+     if vertically. `identity` if no flip.
+   - Leave **Margin frac** and **Well frac** at defaults to start.
+   Hit **Procesar / Process photos**. You should see the original photo next
+   to a rectified+overlay view (green squares = occupied wells, red = empty).
+   If markers aren't detected the page tells you so — check the corners are
+   sharp and in-frame.
+   Enter a session ID (defaults to a timestamp) and hit **Save crops**. Crops
+   land in `data/raw/proprietary_tray/sessions/<id>/`.
+3. *(out of UI for now)* — convert the saved session into a manifest entry.
+   The proprietary tray ingester is a Phase 3 task; until then, public-data
+   `almendra ingest` is the path that gives the Train page something to chew.
+4. **🧠 Train** — pick a backbone (start with `mobilenet_v3_small` — fastest),
+   set **Épocas / Epochs** to 3 for a smoke test, press **Iniciar /  Start**.
+   The progress bar fills and the Plotly chart updates in real time as each
+   epoch completes. The **Best macro-F1** metric tracks the best checkpoint
+   saved. Press **Detener / Stop** if you want to kill the run early.
+5. **📊 Evaluate** — pick the run you just trained from the dropdown, leave
+   `split = test`, press **Ejecutar / Run**. You get headline accuracy /
+   macro-F1 / missed-defect-rate cards, a per-class table, a confusion-matrix
+   heatmap, and a gallery of mis-classified beans.
+6. **🚀 Predict** — works once a run has been **exported**. From a terminal:
+   ```bash
+   uv run almendra export --checkpoint outputs/ui-<timestamp>/best.pt
+   ```
+   Then refresh the Predict page, pick the ONNX file from the dropdown, upload
+   a single bean photo, and check the predicted class + Top-3 + accept/reject
+   verdict.
+7. **⚙️ Settings** — read-only view of the canonical taxonomy, data sources
+   and current Hydra config. Useful for sanity-checking the project paths.
+
+### 3c — Sanity-check checklist
+
+Use this to make sure the UI is *actually* doing what it should:
+
+- [ ] Language toggle in the sidebar instantly swaps every visible string.
+- [ ] Home health panel shows ✅ for Taxonomy and the manifest icon flips
+      between ✅/❌ depending on whether `data/processed/manifest.jsonl` exists.
+- [ ] On Train, the live chart **starts appearing within ~2 s of the first
+      epoch finishing** — confirms the JSONL tail is working.
+- [ ] Stopping training mid-run kills the subprocess (check `ps` or `pgrep -f
+      almendra.cli`).
+- [ ] On Evaluate, mis-classified gallery shows real bean thumbnails (not just
+      captions) when the manifest has accessible image paths.
+- [ ] On Predict, the page lists every ONNX under `outputs/*/model*.onnx` and
+      defaults to the most recently modified INT8 if present.
+- [ ] On Settings, every YAML under `data/sources/` is browsable.
+
+## Troubleshooting
+
+- **"OpenCV is not installed"** on the Tray Capture page → `uv sync --extra
+  capture`, then click the page again.
+- **"onnxruntime is not installed"** on Predict → `uv sync --extra export`.
+- **The Train chart never updates** → check `outputs/ui-<timestamp>/live_metrics.jsonl`
+  exists and grows; if the file isn't being written, the subprocess didn't
+  inherit the `ALMENDRA_LIVE_METRICS` env var (file a bug).
+- **Port already in use** → `uv run almendra ui --port 8888`.
+- **Stuck training subprocess after closing the tab** → `pkill -f
+  "almendra.cli train"`. The UI's Stop button uses SIGTERM on the process
+  group, but if you close the browser before pressing Stop the subprocess
+  keeps running. This is intentional — long runs should survive a tab close.
+
+## What's *not* in v1
+
+These ship in **Phase 6.1**, not this PR:
+
+- A dedicated **Labelling** page with keyboard hotkeys and inter-annotator
+  agreement reporting.
+- An **Export & Bench** page (currently you drop to the CLI for both).
+- A **model-package zip exporter** (ONNX + INT8 + model card + manifest
+  snapshot).
+- A **demo mode** using the public Roboflow data baseline.
diff --git a/pyproject.toml b/pyproject.toml
@@ -57,6 +57,11 @@ capture = [
     # ArUco marker detection for the gridded-tray auto-segmentation pipeline.
     "opencv-contrib-python-headless>=4.9",
 ]
+ui = [
+    # Local Streamlit UI (Phase 6) — runs the toolkit end-to-end for non-CLI users.
+    "streamlit>=1.40",
+    "plotly>=5.20",
+]
 dev = [
     "pytest>=8.0",
     "ruff>=0.5",