Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,12 @@ jobs:
- name: Install uv
uses: astral-sh/setup-uv@v6

# core + dev + train + export β€” installs torch/sklearn/onnx so the model,
# metrics and pipeline modules are exercised. The `data` extra (roboflow,
# dvc) is omitted: tests do not touch dataset downloads.
- name: Install (core + dev + train + export + capture)
run: uv sync --extra dev --extra train --extra export --extra capture
# core + dev + train + export + capture + ui β€” installs torch/sklearn/onnx
# so the model, metrics and pipeline modules are exercised, plus streamlit
# so the UI smoke tests actually run. The `data` extra (roboflow, dvc) is
# omitted: tests do not touch dataset downloads.
- name: Install (core + dev + train + export + capture + ui)
run: uv sync --extra dev --extra train --extra export --extra capture --extra ui

- name: Ruff lint
run: uv run ruff check .
Expand Down
5 changes: 4 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# created by uv. Run `make help` for the full list.

.DEFAULT_GOAL := help
.PHONY: help setup setup-min lint format test info train eval export bench sweep data ingest clean
.PHONY: help setup setup-min lint format test info train eval export bench sweep data ingest ui clean

SWEEP_BACKBONES ?= mobilenet_v3_small,mobilenet_v3_large,efficientnet_b0
SWEEP_EPOCHS ?= 20
Expand Down Expand Up @@ -53,6 +53,9 @@ bench: ## Benchmark inference latency / throughput
sweep: ## Backbone sweep (override SWEEP_BACKBONES, SWEEP_EPOCHS)
uv run almendra sweep --backbones $(SWEEP_BACKBONES) --epochs $(SWEEP_EPOCHS) $(ARGS)

ui: ## Launch the local Streamlit UI (Phase 6)
uv run almendra ui $(ARGS)

clean: ## Remove build artifacts, caches and run outputs
rm -rf outputs mlruns .pytest_cache .ruff_cache dist build
find . -type d -name __pycache__ -exec rm -rf {} +
104 changes: 96 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,12 @@ hardware-agnostic export/benchmark toolchain, and a documented physical capture
protocol. The model is the focus β€” reliable and fast β€” but it must stay easy to
re-train as better data arrives.

> **Status: Phase 2 β€” multi-view model.** The full pipeline (`ingest β†’ train β†’
> eval β†’ export β†’ bench`) runs on public data β€” single-view baseline **0.92 test
> macro-F1** β€” and the multi-view model is trained and shown to be view-count
> robust. See [`docs/research-log.md`](docs/research-log.md) for live progress
> and [the roadmap](#roadmap) below.
> **Status: Phase 6 β€” local UI.** The full pipeline (`ingest β†’ train β†’ eval β†’
> export β†’ bench`) runs on public data; Phase 5's Pareto sweep picked
> **MobileNetV3-Large + static INT8** as the current deploy choice (0.86 macro-F1,
> 3.6 MB, ~430 beans/s on a single CPU thread); and a local Streamlit UI now
> wraps the whole toolkit. See [`docs/research-log.md`](docs/research-log.md)
> for the full log.

## The idea

Expand Down Expand Up @@ -59,6 +60,93 @@ make export # export to ONNX (+ INT8) with a parity check
make bench # benchmark inference latency
```

## Local UI

A Streamlit app wraps the whole pipeline behind a bilingual ES/EN interface β€”
tray capture, training (with live charts), evaluation, prediction and settings.
A non-technical user can run almendra end-to-end without touching the CLI.

![almendra Home page in Spanish](docs/images/ui-home.png)

### Launch

```bash
uv sync --extra ui --extra train --extra export --extra capture
make ui
# equivalent: uv run almendra ui
```

The app opens at <http://localhost:8501>. Flags:

```bash
uv run almendra ui --port 8888 # use a different port
uv run almendra ui --headless # don't auto-open a browser (SSH / CI)
```

| Extra installed | Page it unlocks |
|---|---|
| `ui` | the app itself (Streamlit + Plotly) |
| `train` | Train + Evaluate + mis-classified gallery (PyTorch) |
| `export` | Predict (ONNX Runtime) |
| `capture` | Tray Capture (OpenCV) |

Skipping an extra is fine β€” the page that depends on it shows a clear error
instead of crashing. Install later and reload.

### What's in the app

1. **🏠 Inicio / Home** β€” dataset stats, recent runs, a health panel, and an
inline wizard that walks first-time users through Ingest β†’ Train β†’ Eval.
2. **πŸ“· Bandeja / Tray Capture** β€” drag-and-drop tray photos, see the original
next to the rectified+overlay preview, save per-bean crops to
`data/raw/proprietary_tray/sessions/<id>/`.
3. **🧠 Entrenar / Train** β€” pick a backbone and the key knobs (advanced
controls live behind an expander), launch training as a subprocess, and
watch `train_loss` + `val_macro_f1` update **in real time** as each epoch
completes.

![Train page](docs/images/ui-train.png)

4. **πŸ“Š Evaluar / Evaluate** β€” pick a checkpoint and split, run it, see
accuracy / macro-F1 / missed-defect-rate, per-class breakdown, confusion
matrix heatmap, and a gallery of mis-classified beans.
5. **πŸš€ Predecir / Predict** β€” upload a single-bean photo, get the predicted
class, confidence, Top-3, and an accept/reject verdict from the canonical
taxonomy. Uses the most recent ONNX for speed (prefers INT8).
6. **βš™οΈ Ajustes / Settings** β€” browse the canonical taxonomy, the YAML data
sources, and the current Hydra config.

### End-to-end test in 5 minutes

```bash
# 1. install everything the UI exercises
uv sync --extra ui --extra train --extra export --extra capture

# 2. (optional) ingest the public Robusta baseline so Train/Evaluate have data
export ROBOFLOW_API_KEY=... # see your Roboflow workspace
make data && make ingest

# 3. launch the UI
make ui
```

Then in the browser:

1. **Inicio** β€” confirm the health panel shows Python/PyTorch/Taxonomy green;
the manifest icon flips to βœ… once `data/processed/manifest.jsonl` exists.
2. **Entrenar** β€” backbone `mobilenet_v3_small`, **3 Γ©pocas** (for a smoke
test), **Iniciar entrenamiento**. The Plotly chart should start updating
within a couple of seconds of the first epoch landing.
3. **Evaluar** β€” pick the run you just trained, leave `split = test`,
**Ejecutar**. You get the headline metrics + confusion matrix + error
gallery.
4. **Predecir** β€” from a terminal, `uv run almendra export --checkpoint
outputs/ui-<timestamp>/best.pt`. Refresh the Predict page, pick the ONNX
from the dropdown, upload any single-bean image.

See [`docs/ui.md`](docs/ui.md) for the deeper troubleshooting guide
(stuck subprocesses, port conflicts, missing extras).

## Repository layout

| Path | Purpose |
Expand Down Expand Up @@ -87,10 +175,10 @@ answer, tracked in [`docs/research-log.md`](docs/research-log.md):
- **Phase 0** β€” Scaffolding βœ“
- **Phase 1** β€” Data pipeline + single-view public baseline βœ“
- **Phase 2** β€” Multi-view fusion model βœ“
- **Phase 3** β€” Physical capture protocol + proprietary Arabica data *(current)*
- **Phase 3** β€” Physical capture protocol + proprietary Arabica data *(blocked on data)*
- **Phase 4** β€” Multi-spectral illumination (UV, transillumination)
- **Phase 5** β€” Speed: backbone sweep, INT8, hardware benchmark
- **Phase 6** β€” Deployment reference + sorting-machine spec
- **Phase 5** β€” Speed: backbone sweep, INT8, hardware benchmark βœ“
- **Phase 6** β€” Local Streamlit UI for the whole toolkit βœ“
- *Parallel research track* β€” NIR / hyperspectral internal-defect inspection

## Data & licensing
Expand Down
Binary file added docs/images/ui-home.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/ui-train.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
33 changes: 33 additions & 0 deletions docs/research-log.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,39 @@ deployment tooling). Headline findings:

## Log

### 2026-05-21 β€” Phase 6: local Streamlit UI

- **New optional-deps group `ui`** (`streamlit`, `plotly`) and `almendra ui`
CLI subcommand that exec's `streamlit run` on `src/almendra/ui/app.py`. `make
ui` is the matching shortcut.
- **Six pages** behind a sidebar radio: Home (dataset stats + recent runs +
inline wizard), Tray Capture, Train, Evaluate, Predict, Settings.
- **Bilingual ES/EN from day one** β€” every visible string lives in a central
dict (`ui/components/i18n.py`) and pages read it through `t("key", lang)`. A
sidebar radio toggles the language; adding a third language is a translation
job, not a rewrite. Default: Spanish.
- **Live training charts** β€” `train.loop` writes one JSONL line per epoch to
`outputs/<run>/live_metrics.jsonl` (controlled by env var
`ALMENDRA_LIVE_METRICS` so the CLI use case is untouched). The Train page
launches training as a subprocess, polls the file every ~2 s, and re-renders
a two-line Plotly chart (train_loss + val_macro_f1).
- **Decoupled, file-based contract** β€” the UI is stateless across reruns;
everything it shows (runs, checkpoints, ONNX, metrics) is discovered from
disk under `outputs/`. Anything that writes the same JSONL schema works with
the UI.
- **Inline wizard** on Home with three "press to go" buttons that walk
Ingest β†’ Train β†’ Eval with sensible defaults. Advanced controls (gated
fusion, view-dropout, augmentation toggles) live behind an `Advanced`
expander on the Train page so they don't intimidate first-time users.
- **Tests** β€” `streamlit.testing.v1.AppTest` smoke-tests render every page in
ES *and* EN (12 cases) without exceptions; the i18n dict is checked for
complete coverage; the live-metrics JSONL writer/reader has its own unit
test. All 45 tests in the suites that don't require torch/onnxruntime pass.
- **Scope split** β€” Phase 6.0 (this PR) ships the six pages, the CLI/Make
entrypoints and tests. Phase 6.1 will add: a dedicated **Labelling** page
with hotkeys + IAA reporting, an **Export & Bench** page, a model-package
zip exporter, and a "Demo mode" using the public Roboflow data.

### 2026-05-20 β€” Phase 5: backbone sweep + static INT8 PTQ

- Static INT8 PTQ implemented (`quantize_int8_static` in `src/almendra/export/exporter.py`):
Expand Down
134 changes: 134 additions & 0 deletions docs/ui.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# Local UI (Phase 6)

The `almendra ui` command launches a local Streamlit app that wraps the whole
toolkit β€” tray capture, training, evaluation, prediction and settings β€” so a
non-technical user can run the pipeline end-to-end without touching the CLI.

The UI is **bilingual ES/EN** with a sidebar toggle (Spanish default).

## 1 β€” Install

```bash
# Minimum extras for a full end-to-end run:
uv sync --extra ui --extra train --extra export --extra capture
```

| Extra | Why you need it |
|---|---|
| `ui` | Streamlit + Plotly (the app itself) |
| `train` | PyTorch + torchvision (Train, Evaluate, mis-classified gallery) |
| `export` | ONNX Runtime (Predict page) |
| `capture` | OpenCV (Tray Capture page) |

If you skip an extra, the page that depends on it shows a clear error instead
of crashing. You can come back and install it later.

## 2 β€” Launch

```bash
make ui
# equivalent: uv run almendra ui
```

This exec's `streamlit run` against `src/almendra/ui/app.py`. By default the
app opens at <http://localhost:8501> and your browser auto-opens.

Flags:

```bash
uv run almendra ui --port 8888 # use a different port
uv run almendra ui --headless # don't auto-open a browser (SSH)
```

The first launch may take a few seconds β€” Streamlit warms up its caches.

## 3 β€” End-to-end test flow

The pages are designed to be exercised in order. The **inline wizard on Home**
gives you a fast-path button for each step.

### 3a β€” Have public data already?

If you've already run `make data && make ingest` (Roboflow Robusta dataset),
the manifest at `data/processed/manifest.jsonl` will show up on Home and you
can skip straight to **Train**.

### 3b β€” Cold start (proprietary tray photos)

1. **🏠 Home** β€” confirm the health panel says Python/PyTorch/Taxonomy are all
green. Manifest will show ❌ if you haven't ingested anything yet β€” fine.
2. **πŸ“· Tray Capture** β€” drag in *Side A* (required) and *Side B* (optional)
tray photos. Set:
- **Rows / Cols** β€” wells in your tray.
- **Flip** β€” `mirror_cols` if you flipped the tray horizontally, `mirror_rows`
if vertically. `identity` if no flip.
- Leave **Margin frac** and **Well frac** at defaults to start.
Hit **Procesar / Process photos**. You should see the original photo next
to a rectified+overlay view (green squares = occupied wells, red = empty).
If markers aren't detected the page tells you so β€” check the corners are
sharp and in-frame.
Enter a session ID (defaults to a timestamp) and hit **Save crops**. Crops
land in `data/raw/proprietary_tray/sessions/<id>/`.
3. *(out of UI for now)* β€” convert the saved session into a manifest entry.
The proprietary tray ingester is a Phase 3 task; until then, public-data
`almendra ingest` is the path that gives the Train page something to chew.
4. **🧠 Train** β€” pick a backbone (start with `mobilenet_v3_small` β€” fastest),
set **Γ‰pocas / Epochs** to 3 for a smoke test, press **Iniciar / Start**.
The progress bar fills and the Plotly chart updates in real time as each
epoch completes. The **Best macro-F1** metric tracks the best checkpoint
saved. Press **Detener / Stop** if you want to kill the run early.
5. **πŸ“Š Evaluate** β€” pick the run you just trained from the dropdown, leave
`split = test`, press **Ejecutar / Run**. You get headline accuracy /
macro-F1 / missed-defect-rate cards, a per-class table, a confusion-matrix
heatmap, and a gallery of mis-classified beans.
6. **πŸš€ Predict** β€” works once a run has been **exported**. From a terminal:
```bash
uv run almendra export --checkpoint outputs/ui-<timestamp>/best.pt
```
Then refresh the Predict page, pick the ONNX file from the dropdown, upload
a single bean photo, and check the predicted class + Top-3 + accept/reject
verdict.
7. **βš™οΈ Settings** β€” read-only view of the canonical taxonomy, data sources
and current Hydra config. Useful for sanity-checking the project paths.

### 3c β€” Sanity-check checklist

Use this to make sure the UI is *actually* doing what it should:

- [ ] Language toggle in the sidebar instantly swaps every visible string.
- [ ] Home health panel shows βœ… for Taxonomy and the manifest icon flips
between βœ…/❌ depending on whether `data/processed/manifest.jsonl` exists.
- [ ] On Train, the live chart **starts appearing within ~2 s of the first
epoch finishing** β€” confirms the JSONL tail is working.
- [ ] Stopping training mid-run kills the subprocess (check `ps` or `pgrep -f
almendra.cli`).
- [ ] On Evaluate, mis-classified gallery shows real bean thumbnails (not just
captions) when the manifest has accessible image paths.
- [ ] On Predict, the page lists every ONNX under `outputs/*/model*.onnx` and
defaults to the most recently modified INT8 if present.
- [ ] On Settings, every YAML under `data/sources/` is browsable.

## Troubleshooting

- **"OpenCV is not installed"** on the Tray Capture page β†’ `uv sync --extra
capture`, then click the page again.
- **"onnxruntime is not installed"** on Predict β†’ `uv sync --extra export`.
- **The Train chart never updates** β†’ check `outputs/ui-<timestamp>/live_metrics.jsonl`
exists and grows; if the file isn't being written, the subprocess didn't
inherit the `ALMENDRA_LIVE_METRICS` env var (file a bug).
- **Port already in use** β†’ `uv run almendra ui --port 8888`.
- **Stuck training subprocess after closing the tab** β†’ `pkill -f
"almendra.cli train"`. The UI's Stop button uses SIGTERM on the process
group, but if you close the browser before pressing Stop the subprocess
keeps running. This is intentional β€” long runs should survive a tab close.

## What's *not* in v1

These ship in **Phase 6.1**, not this PR:

- A dedicated **Labelling** page with keyboard hotkeys and inter-annotator
agreement reporting.
- An **Export & Bench** page (currently you drop to the CLI for both).
- A **model-package zip exporter** (ONNX + INT8 + model card + manifest
snapshot).
- A **demo mode** using the public Roboflow data baseline.
5 changes: 5 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,11 @@ capture = [
# ArUco marker detection for the gridded-tray auto-segmentation pipeline.
"opencv-contrib-python-headless>=4.9",
]
ui = [
# Local Streamlit UI (Phase 6) β€” runs the toolkit end-to-end for non-CLI users.
"streamlit>=1.40",
"plotly>=5.20",
]
dev = [
"pytest>=8.0",
"ruff>=0.5",
Expand Down
Loading
Loading