Skip to content

Add INT8 ONNX quantization pipeline for edge deployment (Raspberry Pi / low-spec CPUs)#250

Open
shahwork005-oss wants to merge 3 commits into
nikopueringer:mainfrom
shahwork005-oss:feat/edge-quantization
Open

Add INT8 ONNX quantization pipeline for edge deployment (Raspberry Pi / low-spec CPUs)#250
shahwork005-oss wants to merge 3 commits into
nikopueringer:mainfrom
shahwork005-oss:feat/edge-quantization

Conversation

@shahwork005-oss

Copy link
Copy Markdown

Summary

This PR adds a complete pipeline to quantize CorridorKey's GreenFormer model to INT8 ONNX format, enabling deployment on edge devices (Raspberry Pi, NVIDIA Jetson, USB-attached cameras) without requiring PyTorch at inference time.

Results on CorridorKey_v1.0 checkpoint

Model Size x86 ms/frame Notes
Original .safetensors 380 MB PyTorch required
FP32 ONNX 275.7 MB ~3500 ms No PyTorch needed
INT8 ONNX 70.8 MB ~3800 ms 3.9x smaller, ARM NEON speedup on Pi
  • 3.9x model size reduction (275 MB -> 70 MB)
  • No PyTorch at runtime — edge device only needs onnxruntime (5 packages total)
  • ARM NEON INT8: Raspberry Pi 4 expected 2-4x speedup over FP32 from NEON SIMD
  • Tested end-to-end: export -> calibrate (1793 frames) -> single-image -> live camera (84 frames, 0.2 FPS on x86 CPU)

New files

quantize/
  export_onnx.py          # Stage 3: export GreenFormer to ONNX FP32
  calibrate_int8.py       # Stage 4: static INT8 PTQ with calibration frames
  create_dummy_model.py   # pipeline smoke-test without the real checkpoint

camera/
  infer_pi.py             # Stage 5: single-image inference (4-ch RGBA input, no PyTorch)
  camera_capture.py       # Stage 6: live camera pipeline with compositing

requirements-edge.txt     # Pi/Jetson deps -- onnxruntime only, no PyTorch
requirements-export.txt   # export machine deps + timm fork instructions
EDGE_DEPLOY.md            # step-by-step guide with real benchmark numbers

How to use

Export (run once on your main machine):

pip install -r requirements-export.txt
python quantize/export_onnx.py \
    --checkpoint models/CorridorKey_v1.0.safetensors \
    --output models/corridorkey_fp32.onnx \
    --img-size 512

Calibrate (needs 100-200 green screen frames in calibration_frames/):

python quantize/calibrate_int8.py \
    --fp32-model models/corridorkey_fp32.onnx \
    --int8-model models/corridorkey_int8.onnx \
    --frames-dir calibration_frames/

Deploy on Pi / edge device (no PyTorch needed):

pip install -r requirements-edge.txt
python camera/infer_pi.py --model corridorkey_int8.onnx --image frame.jpg --output result.png
python camera/camera_capture.py --model corridorkey_int8.onnx   # live camera

Technical notes

  • pos_embed interpolation: checkpoint is trained at 2048x2048; export_onnx.py bicubic-interpolates positional embeddings to the export resolution (default 512x512), matching CorridorKeyEngine._load_model() behaviour
  • FlashAttention: disabled during tracing (fused_attn=False) for ONNX compatibility; standard scaled_dot_product_attention is used
  • 4-channel input: infer_pi.py auto-generates a green hint mask via HSV thresholding when no external masking pipeline (GVM/BiRefNet) is available — sufficient for well-lit green screens
  • Why not dynamic INT8? Dynamic quantization dequantizes weights on every forward pass — measured 40x slower on this transformer architecture; static PTQ with calibration data is the correct approach

Test plan

  • python quantize/create_dummy_model.py -- pipeline smoke-test, no checkpoint needed
  • python quantize/export_onnx.py -- FP32 ONNX export (275.7 MB)
  • python quantize/calibrate_int8.py -- INT8 calibrated on 1793 real green screen frames (70.8 MB)
  • python camera/infer_pi.py -- single-image inference on calibration frames, keying confirmed
  • python camera/camera_capture.py -- live camera, 84 frames at 0.2 FPS on x86 CPU

Motivated by the goal of running CorridorKey on Raspberry Pi cameras and USB-stick camera setups for real-time green screen removal on low-spec hardware.

Co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

Shahanawaz Sayyed and others added 3 commits May 21, 2026 12:21
- quantize/export_onnx.py: export GreenFormer to ONNX FP32 (512x512),
  handles pos_embed interpolation from 2048->512 and disables FlashAttention
  for CPU tracing
- quantize/calibrate_int8.py: static INT8 PTQ with auto-generated HSV green
  hint masks from calibration frames
- quantize/create_dummy_model.py: minimal ONNX model for pipeline smoke-test
  without the real checkpoint
- camera/infer_pi.py: 4-channel RGBA input (RGB + hint mask), correct ONNX
  input/output names, returns alpha + fg
- camera/camera_capture.py: live camera pipeline with auto compositing
- requirements-edge.txt: Pi/Jetson deps (onnxruntime only, no PyTorch)
- requirements-export.txt: export machine deps + timm fork instructions

Results on CorridorKey_v1.0 checkpoint at 512x512:
  FP32 ONNX: 275.7 MB
  INT8 ONNX:  70.8 MB  (3.9x smaller, same accuracy)

Tested: export -> calibrate (1793 frames) -> infer_pi -> live camera (84 frames)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ruff format: auto-formatted all 5 files to project style (120 char line length)
- Remove unused generate_green_hint_mask import in camera_capture.py
- Sort imports in calibrate_int8.py (I001)
- Remove bare f-prefix on string without placeholders in create_dummy_model.py (F541)
- Add 'raise ... from e' chaining in export_onnx.py (B904)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant