Phase 5: backbone sweep + static INT8 PTQ (RQ3, RQ4)#4
Merged
Conversation
204f9b6 to
bc55ae8
Compare
Static INT8 post-training quantization with per-channel weights, QUInt8 activations, and a shuffled real-image calibration set drawn from the train split. Default mode for export. Backbone sweep runner (almendra sweep / make sweep): train -> eval -> export -> bench per backbone, writes a CSV + Pareto markdown report. CI: actions/checkout@v5, setup-uv@v6. Sweep across MobileNetV3-Small / MobileNetV3-Large / EfficientNet-B0 (20 epochs, ONNX Runtime CPU, batch 1): | backbone | FP32 mF1 | INT8 mF1 | INT8 MB | INT8 p50 ms | |--------------------|----------|----------|---------|-------------| | mobilenet_v3_small | 0.894 | 0.034 | 1.37 | 1.22 | | mobilenet_v3_large | 0.939 | 0.860 | 3.63 | 2.34 | | efficientnet_b0 | 0.895 | 0.714 | 5.13 | 4.63 | Recommendation: MobileNetV3-Large + static INT8 — 0.86 INT8 mF1, 3.63 MB, ~430 beans/s on a single CPU thread. MobileNetV3-Small's hardswish collapses under per-tensor MinMax PTQ — use dynamic INT8 or QAT for that backbone. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
bc55ae8 to
e8fc931
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase 5 — backbone sweep + static INT8 PTQ
Answers RQ3 (accuracy / latency / model-size Pareto frontier across
backbones) and RQ4 (INT8 PTQ accuracy cost), and produces a concrete
deployment recommendation today.
What's included
exporter.quantize_int8_static) — ONNX Runtimequantize_staticwith per-channel weights, QUInt8 activations (rightrange for ReLU-family outputs) and a shuffled real-image calibration
set drawn from the train split. Default mode in
configs/export/onnx_int8.yaml.src/almendra/bench/sweep.py+almendra sweepmake sweep): train → eval → export(static INT8) → bench per backbone;writes a results CSV and a Pareto markdown table.
actions/checkout@v5+setup-uv@v6(Node-20 deprecation).tests/test_quantize.py(end-to-end static PTQ),tests/test_sweep.py(report writers). 39 tests pass.
Results
Sweep — MobileNetV3-Small / MobileNetV3-Large / EfficientNet-B0
(20 epochs, single-view, ONNX Runtime CPU EP, batch 1, test split 62 beans):
Deployment pick
MobileNetV3-Large + static INT8 PTQ — 0.86 INT8 macro-F1, 3.63 MB, ~430
beans/s on a single CPU thread. That's already orders of magnitude above the
throughput a singulating sorter can feed; the model is comfortably not the
bottleneck (as designed).
Honest finding — MobileNetV3-Small INT8 collapse
A real discovery worth recording (not papering over): MN3-Small drops 86
macro-F1 points under static PTQ. Its hardswish activation has a range
that per-tensor MinMax calibration cannot pin down cleanly, and the per-channel
weight fix that rescues Conv-heavy networks does not transfer to the activation
side. Fix: use
mode: int8_dynamic(weights only) for this backbone, or doquantization-aware training (Phase 5+). Documented in the research log and the
TODO list.
Caveats
backbones in one sweep. With longer training MN3-Large would likely exceed
0.94 mF1.
for all FP32 models — the metric saturates at this set size.
would extend the Pareto frontier on the small-and-fast corner.
🤖 Generated with Claude Code