Phase 5: backbone sweep + static INT8 PTQ (RQ3, RQ4) by mrjunos · Pull Request #4 · mrjunos/almendra

mrjunos · 2026-05-20T16:11:50Z

Phase 5 — backbone sweep + static INT8 PTQ

Answers RQ3 (accuracy / latency / model-size Pareto frontier across
backbones) and RQ4 (INT8 PTQ accuracy cost), and produces a concrete
deployment recommendation today.

What's included

Static INT8 PTQ (exporter.quantize_int8_static) — ONNX Runtime
quantize_static with per-channel weights, QUInt8 activations (right
range for ReLU-family outputs) and a shuffled real-image calibration
set drawn from the train split. Default mode in configs/export/onnx_int8.yaml.
Backbone sweep runner (src/almendra/bench/sweep.py + almendra sweep
- make sweep): train → eval → export(static INT8) → bench per backbone;
  writes a results CSV and a Pareto markdown table.
CI bump — actions/checkout@v5 + setup-uv@v6 (Node-20 deprecation).
Tests — tests/test_quantize.py (end-to-end static PTQ), tests/test_sweep.py
(report writers). 39 tests pass.

Results

Sweep — MobileNetV3-Small / MobileNetV3-Large / EfficientNet-B0
(20 epochs, single-view, ONNX Runtime CPU EP, batch 1, test split 62 beans):

backbone	FP32 mF1	INT8 mF1	mF1 loss	FP32 MB	INT8 MB	FP32 p50 ms	INT8 p50 ms	INT8 beans/s
mobilenet_v3_small	0.894	0.034	−0.860	4.06	1.37	2.12	1.22	812
mobilenet_v3_large	0.939	0.860	−0.079	12.45	3.63	5.50	2.34	427
efficientnet_b0	0.895	0.714	−0.182	16.78	5.13	10.03	4.63	216

Deployment pick

MobileNetV3-Large + static INT8 PTQ — 0.86 INT8 macro-F1, 3.63 MB, ~430
beans/s on a single CPU thread. That's already orders of magnitude above the
throughput a singulating sorter can feed; the model is comfortably not the
bottleneck (as designed).

Honest finding — MobileNetV3-Small INT8 collapse

A real discovery worth recording (not papering over): MN3-Small drops 86
macro-F1 points under static PTQ. Its hardswish activation has a range
that per-tensor MinMax calibration cannot pin down cleanly, and the per-channel
weight fix that rescues Conv-heavy networks does not transfer to the activation
side. Fix: use mode: int8_dynamic (weights only) for this backbone, or do
quantization-aware training (Phase 5+). Documented in the research log and the
TODO list.

Caveats

20 epochs is shorter than the Phase 1 baseline (30 epochs) — to fit three
backbones in one sweep. With longer training MN3-Large would likely exceed
0.94 mF1.
Test set is 62 beans (Roboflow Robusta test split). Missed-defect rate is 3
for all FP32 models — the metric saturates at this set size.
timm-based variants (EfficientNet-Lite, GhostNet, MobileOne) deferred — they
would extend the Pareto frontier on the small-and-fast corner.

🤖 Generated with Claude Code

Static INT8 post-training quantization with per-channel weights, QUInt8 activations, and a shuffled real-image calibration set drawn from the train split. Default mode for export. Backbone sweep runner (almendra sweep / make sweep): train -> eval -> export -> bench per backbone, writes a CSV + Pareto markdown report. CI: actions/checkout@v5, setup-uv@v6. Sweep across MobileNetV3-Small / MobileNetV3-Large / EfficientNet-B0 (20 epochs, ONNX Runtime CPU, batch 1): | backbone | FP32 mF1 | INT8 mF1 | INT8 MB | INT8 p50 ms | |--------------------|----------|----------|---------|-------------| | mobilenet_v3_small | 0.894 | 0.034 | 1.37 | 1.22 | | mobilenet_v3_large | 0.939 | 0.860 | 3.63 | 2.34 | | efficientnet_b0 | 0.895 | 0.714 | 5.13 | 4.63 | Recommendation: MobileNetV3-Large + static INT8 — 0.86 INT8 mF1, 3.63 MB, ~430 beans/s on a single CPU thread. MobileNetV3-Small's hardswish collapses under per-tensor MinMax PTQ — use dynamic INT8 or QAT for that backbone. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

mrjunos force-pushed the phase-5-speed-sweep branch from 204f9b6 to bc55ae8 Compare May 20, 2026 16:16

mrjunos force-pushed the phase-5-speed-sweep branch from bc55ae8 to e8fc931 Compare May 20, 2026 16:16

mrjunos merged commit 0cac5a8 into main May 20, 2026
1 check passed

mrjunos deleted the phase-5-speed-sweep branch May 20, 2026 20:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase 5: backbone sweep + static INT8 PTQ (RQ3, RQ4)#4

Phase 5: backbone sweep + static INT8 PTQ (RQ3, RQ4)#4
mrjunos merged 1 commit into
mainfrom
phase-5-speed-sweep

mrjunos commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mrjunos commented May 20, 2026