PC-023 — GPU vendor lock-in (CUDA-only build)

**Status (at mirror time):** **Open** (tracking)  •  **Severity:** High

**One-liner:** Build hard-codes NVIDIA/CUDA — AMD/Intel/CPU-only users can't install. Match llama.cpp's full backend matrix

---

- **Status:** Open (tracking — deferred to roadmap; not blocking Phase 1)
- **Severity:** High (blocks every non-NVIDIA user from installing)
- **Location:** `inference/Dockerfile.v31`, `docker-compose.yml`,
  `scripts/install.sh`

The current build hard-codes NVIDIA/CUDA in three places:

1. `Dockerfile.v31:1` — base image is
   `nvidia/cuda:12.9.0-devel-rockylinux9`.
2. `Dockerfile.v31:25` — cmake is invoked with `-DGGML_CUDA=ON` and
   no other backend.
3. `docker-compose.yml:28` — GPU resource reservation specifies
   `driver: nvidia`.

llama.cpp itself supports a wide backend matrix — CUDA, ROCm/HIP
(AMD), Vulkan (universal), SYCL (Intel), Metal (Apple), and a CPU-only
fallback. ATLAS only exposes one. AMD users currently can't install
the stack at all.

geometric-lens runs PyTorch on CPU
(`geometric_lens/service.py:60`), so only the llama-server side needs
the abstraction. The Confidence Router, V3 pipeline, Pattern Cache,
proxy, and sandbox are all vendor-agnostic.

### Fix

_Two reasonable directions; pragmatic combo recommended:_

- **(a) Vulkan as the universal path.** One Dockerfile, works on
  NVIDIA + AMD + Intel + CPU. Slower than native CUDA/ROCm (typically
  20–40% perf hit) but the operational simplicity is huge — no
  per-vendor matrix, no driver detection logic. Best fit for ATLAS's
  "easy to install" goal.
- **(b) Per-vendor Dockerfiles.** `Dockerfile.v31` (CUDA),
  `Dockerfile.v31.rocm` (AMD), `Dockerfile.v31.vulkan` (universal),
  `Dockerfile.v31.cpu` (no GPU). Compose picks one via
  `ATLAS_GPU_VENDOR=nvidia|amd|vulkan|cpu`. Best raw performance per
  vendor; more surface area to maintain.

The pragmatic combo: ship (a) first to unbreak non-NVIDIA users,
then add (b) for users who want native-vendor speed. Vulkan stays
as the universal fallback.

Cross-references:

- Phase 0 **hardware-tier presets** roadmap item — naturally
  encompasses per-vendor presets once the Dockerfile matrix exists.
- Phase 0 **`atlas-bootstrap.sh` installer** — needs to detect GPU
  vendor and pick the right image variant; currently assumes NVIDIA.
- Phase 0 **prebuilt GHCR images** — multi-arch + multi-vendor tag
  matrix once vendor abstraction lands (e.g.,
  `ghcr.io/.../atlas-llama-server:v3.1-cuda`,
  `:v3.1-rocm`, `:v3.1-vulkan`).
- GitHub issues this resolves (close as covered by PC-023):
  - **#26** — ROCm support for AMD GPUs (direct match for path **(b)**
    `Dockerfile.v31.rocm`)
  - **#27** — Intel oneAPI / SYCL backend (one of the universal-(a)
    backends + a per-vendor (b) image option)
  - **#32** — Apple Silicon (Metal) deployment guide. Note: Apple
    Silicon doesn't run our docker stack natively (no nvidia/cuda
    base, no Linux containers with GPU passthrough on Mac). Either
    a separate Mac-native install path (brew + raw llama.cpp Metal
    binary) or wait until vLLM/MLX backend lands via PC-024. Track
    here so the GPU-vendor work is aware of it.
  - **#7** — DirectML (Windows). Vulkan path (a) covers this for
    AMD-on-Windows and Intel-on-Windows. NVIDIA-on-Windows users
    just use the CUDA path. Pure-DirectML support would be its own
    backend; defer until there's demand.

**Docs to update in the same change:**
- `docs/SETUP.md` — vendor selection step in the install flow,
  per-vendor prerequisite blocks (NVIDIA Container Toolkit vs ROCm
  vs Vulkan ICD vs CPU-only).
- `docs/ARCHITECTURE.md` — the llama-server box becomes "llama-server
  (vendor-pluggable backend)" with a list of supported backends.
- `docs/CONFIGURATION.md` — document `ATLAS_GPU_VENDOR` and any
  per-vendor env vars (e.g., AMD's `HSA_OVERRIDE_GFX_VERSION`).
- `docs/TROUBLESHOOTING.md` — vendor-specific failure-mode entries
  (AMD ROCm version mismatches, Vulkan ICD missing, etc.).
- `README.md` — system requirements section.

---

---
*Mirrored from local `ISSUES.md`. ISSUES.md is gitignored; this issue is the canonical public record. PC-### numbering preserved for cross-references.*


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PC-023 — GPU vendor lock-in (CUDA-only build) #65

Fix

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

PC-023 — GPU vendor lock-in (CUDA-only build) #65

Description

Fix

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions