Skip to content

Bug: fits_resource_constraints compares model memory against system RAM, not GPU VRAM #2015

@mrveiss

Description

@mrveiss

Problem

ModelInfo.fits_resource_constraints() in autobot-backend/utils/model_optimization/types.py compares the estimated model memory requirement against SystemResources.available_memory_gb (system RAM). However, LLM models are loaded into GPU VRAM, not system RAM.

This means:

  • A system with 64GB RAM but 8GB VRAM will allow loading a 30GB model that will OOM on the GPU
  • A system with 8GB RAM but 24GB VRAM will reject models that would fit perfectly on the GPU

SystemResources has a gpu_vram_gb field (added in #1966) but it is never populated by callers, and fits_resource_constraints() doesn't use it.

Discovered During

Working on #1966 (model memory estimation).

Expected Behavior

Impact

High — Models may be recommended that OOM on GPU, or rejected despite fitting in VRAM.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions