-
-
Notifications
You must be signed in to change notification settings - Fork 1
Bug: fits_resource_constraints compares model memory against system RAM, not GPU VRAM #2015
Copy link
Copy link
Closed
Labels
Description
Problem
ModelInfo.fits_resource_constraints() in autobot-backend/utils/model_optimization/types.py compares the estimated model memory requirement against SystemResources.available_memory_gb (system RAM). However, LLM models are loaded into GPU VRAM, not system RAM.
This means:
- A system with 64GB RAM but 8GB VRAM will allow loading a 30GB model that will OOM on the GPU
- A system with 8GB RAM but 24GB VRAM will reject models that would fit perfectly on the GPU
SystemResources has a gpu_vram_gb field (added in #1966) but it is never populated by callers, and fits_resource_constraints() doesn't use it.
Discovered During
Working on #1966 (model memory estimation).
Expected Behavior
- When GPU is available, compare estimated model memory against GPU VRAM
- When CPU-only inference, compare against system RAM
SystemResourcescallers should populategpu_vram_gbfrom GPU detection (Bug: GPU detection only recognizes NVIDIA RTX cards, fails on AMD/Intel/non-RTX NVIDIA #1959)
Impact
High — Models may be recommended that OOM on GPU, or rejected despite fitting in VRAM.
Reactions are currently unavailable