Skip to content

PC-023 — GPU vendor lock-in (CUDA-only build) #65

@itigges22

Description

@itigges22

Status (at mirror time): Open (tracking) • Severity: High

One-liner: Build hard-codes NVIDIA/CUDA — AMD/Intel/CPU-only users can't install. Match llama.cpp's full backend matrix


  • Status: Open (tracking — deferred to roadmap; not blocking Phase 1)
  • Severity: High (blocks every non-NVIDIA user from installing)
  • Location: inference/Dockerfile.v31, docker-compose.yml,
    scripts/install.sh

The current build hard-codes NVIDIA/CUDA in three places:

  1. Dockerfile.v31:1 — base image is
    nvidia/cuda:12.9.0-devel-rockylinux9.
  2. Dockerfile.v31:25 — cmake is invoked with -DGGML_CUDA=ON and
    no other backend.
  3. docker-compose.yml:28 — GPU resource reservation specifies
    driver: nvidia.

llama.cpp itself supports a wide backend matrix — CUDA, ROCm/HIP
(AMD), Vulkan (universal), SYCL (Intel), Metal (Apple), and a CPU-only
fallback. ATLAS only exposes one. AMD users currently can't install
the stack at all.

geometric-lens runs PyTorch on CPU
(geometric_lens/service.py:60), so only the llama-server side needs
the abstraction. The Confidence Router, V3 pipeline, Pattern Cache,
proxy, and sandbox are all vendor-agnostic.

Fix

Two reasonable directions; pragmatic combo recommended:

  • (a) Vulkan as the universal path. One Dockerfile, works on
    NVIDIA + AMD + Intel + CPU. Slower than native CUDA/ROCm (typically
    20–40% perf hit) but the operational simplicity is huge — no
    per-vendor matrix, no driver detection logic. Best fit for ATLAS's
    "easy to install" goal.
  • (b) Per-vendor Dockerfiles. Dockerfile.v31 (CUDA),
    Dockerfile.v31.rocm (AMD), Dockerfile.v31.vulkan (universal),
    Dockerfile.v31.cpu (no GPU). Compose picks one via
    ATLAS_GPU_VENDOR=nvidia|amd|vulkan|cpu. Best raw performance per
    vendor; more surface area to maintain.

The pragmatic combo: ship (a) first to unbreak non-NVIDIA users,
then add (b) for users who want native-vendor speed. Vulkan stays
as the universal fallback.

Cross-references:

  • Phase 0 hardware-tier presets roadmap item — naturally
    encompasses per-vendor presets once the Dockerfile matrix exists.
  • Phase 0 atlas-bootstrap.sh installer — needs to detect GPU
    vendor and pick the right image variant; currently assumes NVIDIA.
  • Phase 0 prebuilt GHCR images — multi-arch + multi-vendor tag
    matrix once vendor abstraction lands (e.g.,
    ghcr.io/.../atlas-llama-server:v3.1-cuda,
    :v3.1-rocm, :v3.1-vulkan).
  • GitHub issues this resolves (close as covered by PC-023):
    • feat: ROCm support for AMD GPUs #26 — ROCm support for AMD GPUs (direct match for path (b)
      Dockerfile.v31.rocm)
    • feat: Intel oneAPI / SYCL backend support #27 — Intel oneAPI / SYCL backend (one of the universal-(a)
      backends + a per-vendor (b) image option)
    • feat: Apple Silicon (Metal) deployment guide #32 — Apple Silicon (Metal) deployment guide. Note: Apple
      Silicon doesn't run our docker stack natively (no nvidia/cuda
      base, no Linux containers with GPU passthrough on Mac). Either
      a separate Mac-native install path (brew + raw llama.cpp Metal
      binary) or wait until vLLM/MLX backend lands via PC-024. Track
      here so the GPU-vendor work is aware of it.
    • DirectMLSupport #7 — DirectML (Windows). Vulkan path (a) covers this for
      AMD-on-Windows and Intel-on-Windows. NVIDIA-on-Windows users
      just use the CUDA path. Pure-DirectML support would be its own
      backend; defer until there's demand.

Docs to update in the same change:

  • docs/SETUP.md — vendor selection step in the install flow,
    per-vendor prerequisite blocks (NVIDIA Container Toolkit vs ROCm
    vs Vulkan ICD vs CPU-only).
  • docs/ARCHITECTURE.md — the llama-server box becomes "llama-server
    (vendor-pluggable backend)" with a list of supported backends.
  • docs/CONFIGURATION.md — document ATLAS_GPU_VENDOR and any
    per-vendor env vars (e.g., AMD's HSA_OVERRIDE_GFX_VERSION).
  • docs/TROUBLESHOOTING.md — vendor-specific failure-mode entries
    (AMD ROCm version mismatches, Vulkan ICD missing, etc.).
  • README.md — system requirements section.


Mirrored from local ISSUES.md. ISSUES.md is gitignored; this issue is the canonical public record. PC-### numbering preserved for cross-references.

Metadata

Metadata

Assignees

No one assigned

    Labels

    deferredTracked but deferred to roadmap (Phase 1+)reliabilityPC-### reliability ticket from ISSUES.md

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions