Add INT8 ONNX quantization pipeline for edge deployment (Raspberry Pi / low-spec CPUs) by shahwork005-oss · Pull Request #250 · nikopueringer/CorridorKey

shahwork005-oss · 2026-05-21T06:52:00Z

Summary

This PR adds a complete pipeline to quantize CorridorKey's GreenFormer model to INT8 ONNX format, enabling deployment on edge devices (Raspberry Pi, NVIDIA Jetson, USB-attached cameras) without requiring PyTorch at inference time.

Results on `CorridorKey_v1.0` checkpoint

Model	Size	x86 ms/frame	Notes
Original `.safetensors`	380 MB	—	PyTorch required
FP32 ONNX	275.7 MB	~3500 ms	No PyTorch needed
INT8 ONNX	70.8 MB	~3800 ms	3.9x smaller, ARM NEON speedup on Pi

3.9x model size reduction (275 MB -> 70 MB)
No PyTorch at runtime — edge device only needs onnxruntime (5 packages total)
ARM NEON INT8: Raspberry Pi 4 expected 2-4x speedup over FP32 from NEON SIMD
Tested end-to-end: export -> calibrate (1793 frames) -> single-image -> live camera (84 frames, 0.2 FPS on x86 CPU)

New files

quantize/
  export_onnx.py          # Stage 3: export GreenFormer to ONNX FP32
  calibrate_int8.py       # Stage 4: static INT8 PTQ with calibration frames
  create_dummy_model.py   # pipeline smoke-test without the real checkpoint

camera/
  infer_pi.py             # Stage 5: single-image inference (4-ch RGBA input, no PyTorch)
  camera_capture.py       # Stage 6: live camera pipeline with compositing

requirements-edge.txt     # Pi/Jetson deps -- onnxruntime only, no PyTorch
requirements-export.txt   # export machine deps + timm fork instructions
EDGE_DEPLOY.md            # step-by-step guide with real benchmark numbers

How to use

Export (run once on your main machine):

pip install -r requirements-export.txt
python quantize/export_onnx.py \
    --checkpoint models/CorridorKey_v1.0.safetensors \
    --output models/corridorkey_fp32.onnx \
    --img-size 512

Calibrate (needs 100-200 green screen frames in calibration_frames/):

python quantize/calibrate_int8.py \
    --fp32-model models/corridorkey_fp32.onnx \
    --int8-model models/corridorkey_int8.onnx \
    --frames-dir calibration_frames/

Deploy on Pi / edge device (no PyTorch needed):

pip install -r requirements-edge.txt
python camera/infer_pi.py --model corridorkey_int8.onnx --image frame.jpg --output result.png
python camera/camera_capture.py --model corridorkey_int8.onnx   # live camera

Technical notes

pos_embed interpolation: checkpoint is trained at 2048x2048; export_onnx.py bicubic-interpolates positional embeddings to the export resolution (default 512x512), matching CorridorKeyEngine._load_model() behaviour
FlashAttention: disabled during tracing (fused_attn=False) for ONNX compatibility; standard scaled_dot_product_attention is used
4-channel input: infer_pi.py auto-generates a green hint mask via HSV thresholding when no external masking pipeline (GVM/BiRefNet) is available — sufficient for well-lit green screens
Why not dynamic INT8? Dynamic quantization dequantizes weights on every forward pass — measured 40x slower on this transformer architecture; static PTQ with calibration data is the correct approach

Test plan

python quantize/create_dummy_model.py -- pipeline smoke-test, no checkpoint needed
python quantize/export_onnx.py -- FP32 ONNX export (275.7 MB)
python quantize/calibrate_int8.py -- INT8 calibrated on 1793 real green screen frames (70.8 MB)
python camera/infer_pi.py -- single-image inference on calibration frames, keying confirmed
python camera/camera_capture.py -- live camera, 84 frames at 0.2 FPS on x86 CPU

Motivated by the goal of running CorridorKey on Raspberry Pi cameras and USB-stick camera setups for real-time green screen removal on low-spec hardware.

Co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

- quantize/export_onnx.py: export GreenFormer to ONNX FP32 (512x512), handles pos_embed interpolation from 2048->512 and disables FlashAttention for CPU tracing - quantize/calibrate_int8.py: static INT8 PTQ with auto-generated HSV green hint masks from calibration frames - quantize/create_dummy_model.py: minimal ONNX model for pipeline smoke-test without the real checkpoint - camera/infer_pi.py: 4-channel RGBA input (RGB + hint mask), correct ONNX input/output names, returns alpha + fg - camera/camera_capture.py: live camera pipeline with auto compositing - requirements-edge.txt: Pi/Jetson deps (onnxruntime only, no PyTorch) - requirements-export.txt: export machine deps + timm fork instructions Results on CorridorKey_v1.0 checkpoint at 512x512: FP32 ONNX: 275.7 MB INT8 ONNX: 70.8 MB (3.9x smaller, same accuracy) Tested: export -> calibrate (1793 frames) -> infer_pi -> live camera (84 frames) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- ruff format: auto-formatted all 5 files to project style (120 char line length) - Remove unused generate_green_hint_mask import in camera_capture.py - Sort imports in calibrate_int8.py (I001) - Remove bare f-prefix on string without placeholders in create_dummy_model.py (F541) - Add 'raise ... from e' chaining in export_onnx.py (B904) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Shahanawaz Sayyed and others added 3 commits May 21, 2026 12:21

Update README with edge deployment section and community extension link

5cfce47

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add INT8 ONNX quantization pipeline for edge deployment (Raspberry Pi / low-spec CPUs)#250

Add INT8 ONNX quantization pipeline for edge deployment (Raspberry Pi / low-spec CPUs)#250
shahwork005-oss wants to merge 3 commits into
nikopueringer:mainfrom
shahwork005-oss:feat/edge-quantization

shahwork005-oss commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shahwork005-oss commented May 21, 2026

Summary

Results on CorridorKey_v1.0 checkpoint

New files

How to use

Technical notes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Results on `CorridorKey_v1.0` checkpoint