feat(api): add make run-gpu for native CUDA serving by Chouffe · Pull Request #52 · pyronear/temporal-model

Chouffe · 2026-06-11T16:13:24Z

Summary

Adds two targets to api/Makefile for serving the API on a CUDA GPU natively (the published Docker image intentionally stays CPU-only — no second image variant):
- gpu-setup (one-time): re-syncs the venv, then swaps torch/torchvision for cu130 wheels pinned to the uv.lock releases (2.12.0/0.27.0) so the GPU venv can't silently drift from the tested versions.
- run-gpu: serves with uv run --no-sync, after asserting CUDA is actually available — it refuses to start (instead of silently serving on CPU) when a plain uv run/uv sync has reverted the venv to the locked CPU wheels. Restarts do no reinstall work and need no network.
Documents the flow in api/README.md: native runs must set TEMPORAL_API_MODEL_PATH (the /models/model.zip default is container-only), make test/lint/format revert the CUDA wheels, and run-gpu binds 0.0.0.0:8000 so set TEMPORAL_API_TOKEN on shared networks.

Test Plan

make gpu-setup + make run-gpu end-to-end on an RTX 4070 Ti SUPER (driver 580.159.03): server holds GPU memory (1172 MiB), POST /predict on a real 7-frame MinIO-backed sequence returns is_smoke=true, probability=0.87009 — identical to CPU serving.
Pin verified: gpu-setup installs exactly torch==2.12.0+cu130 / torchvision==0.27.0+cu130, with numpy staying at the locked 1.26.4 (the unpinned form had bumped it to 2.4.4).
Guard verified: make run-gpu on a CPU venv exits 2 with CUDA torch not available - run: make gpu-setup.
Restart path verified: run-gpu goes straight to uvicorn — no sync/reinstall churn.
pytest (122 passed, 1 skipped) and ruff check clean.

The released Docker image stays CPU-only on purpose (torch pinned to the pytorch-cpu wheel index keeps it small). For GPU benchmarking/dev, run-gpu swaps the venv's torch/torchvision for cu130 wheels and serves uvicorn with --no-sync so the CUDA wheels survive startup. Verified end-to-end on an RTX 4070 Ti SUPER: /predict on a real 7-frame sequence returns the same prediction as CPU serving, with the server process on the GPU.

…llback Review follow-ups on the GPU recipe: - Pin the cu130 install to the uv.lock releases (torch==2.12.0, torchvision==0.27.0) so the GPU venv cannot silently drift to a newer torch than the one CI tests; the pin also keeps shared deps (numpy) at their locked versions. - Split the one-time wheel swap (gpu-setup) from serving (run-gpu) so restarts skip the multi-GB venv churn and need no network. - run-gpu now asserts CUDA is available before serving instead of silently falling back to CPU after a plain uv run/uv sync reverted the wheels. - README: document that MODEL_PATH must be set natively (the /models/model.zip default is container-only), that make test/lint/format revert the wheels, and to set TEMPORAL_API_TOKEN on shared networks since run-gpu binds 0.0.0.0.

Remaining review follow-ups: - gpu-setup's help text now points at the README instead of repeating the driver floor, so the requirement lives in one place. - README: note that the compose api service publishes the same port 8000 as run-gpu; start only minio/createbuckets when compose is just providing S3. Kept --reinstall-package (not the shorter --reinstall): tested the swap and --reinstall force-reinstalls the whole resolution, bumping numpy past its locked 1.26.4 — the same drift the version pins exist to prevent.

MateoLostanlen

LGTM

Chouffe requested a review from MateoLostanlen June 11, 2026 16:15

Chouffe added 2 commits June 11, 2026 18:24

Chouffe mentioned this pull request Jun 11, 2026

api: TEMPORAL_API_HOST/PORT settings are dead config — never read by any entry point #54

Open

MateoLostanlen approved these changes Jun 11, 2026

View reviewed changes

Chouffe merged commit 327a951 into main Jun 12, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(api): add make run-gpu for native CUDA serving#52

feat(api): add make run-gpu for native CUDA serving#52
Chouffe merged 3 commits into
mainfrom
arthur/api-gpu-run

Chouffe commented Jun 11, 2026 •

edited

Loading

Uh oh!

MateoLostanlen left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Chouffe commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Uh oh!

MateoLostanlen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Chouffe commented Jun 11, 2026 •

edited

Loading