You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
llama.cpp itself supports a wide backend matrix — CUDA, ROCm/HIP
(AMD), Vulkan (universal), SYCL (Intel), Metal (Apple), and a CPU-only
fallback. ATLAS only exposes one. AMD users currently can't install
the stack at all.
geometric-lens runs PyTorch on CPU
(geometric_lens/service.py:60), so only the llama-server side needs
the abstraction. The Confidence Router, V3 pipeline, Pattern Cache,
proxy, and sandbox are all vendor-agnostic.
Fix
Two reasonable directions; pragmatic combo recommended:
(a) Vulkan as the universal path. One Dockerfile, works on
NVIDIA + AMD + Intel + CPU. Slower than native CUDA/ROCm (typically
20–40% perf hit) but the operational simplicity is huge — no
per-vendor matrix, no driver detection logic. Best fit for ATLAS's
"easy to install" goal.
(b) Per-vendor Dockerfiles.Dockerfile.v31 (CUDA), Dockerfile.v31.rocm (AMD), Dockerfile.v31.vulkan (universal), Dockerfile.v31.cpu (no GPU). Compose picks one via ATLAS_GPU_VENDOR=nvidia|amd|vulkan|cpu. Best raw performance per
vendor; more surface area to maintain.
The pragmatic combo: ship (a) first to unbreak non-NVIDIA users,
then add (b) for users who want native-vendor speed. Vulkan stays
as the universal fallback.
Cross-references:
Phase 0 hardware-tier presets roadmap item — naturally
encompasses per-vendor presets once the Dockerfile matrix exists.
Phase 0 atlas-bootstrap.sh installer — needs to detect GPU
vendor and pick the right image variant; currently assumes NVIDIA.
Phase 0 prebuilt GHCR images — multi-arch + multi-vendor tag
matrix once vendor abstraction lands (e.g., ghcr.io/.../atlas-llama-server:v3.1-cuda, :v3.1-rocm, :v3.1-vulkan).
GitHub issues this resolves (close as covered by PC-023):
feat: Apple Silicon (Metal) deployment guide #32 — Apple Silicon (Metal) deployment guide. Note: Apple
Silicon doesn't run our docker stack natively (no nvidia/cuda
base, no Linux containers with GPU passthrough on Mac). Either
a separate Mac-native install path (brew + raw llama.cpp Metal
binary) or wait until vLLM/MLX backend lands via PC-024. Track
here so the GPU-vendor work is aware of it.
DirectMLSupport #7 — DirectML (Windows). Vulkan path (a) covers this for
AMD-on-Windows and Intel-on-Windows. NVIDIA-on-Windows users
just use the CUDA path. Pure-DirectML support would be its own
backend; defer until there's demand.
Docs to update in the same change:
docs/SETUP.md — vendor selection step in the install flow,
per-vendor prerequisite blocks (NVIDIA Container Toolkit vs ROCm
vs Vulkan ICD vs CPU-only).
docs/ARCHITECTURE.md — the llama-server box becomes "llama-server
(vendor-pluggable backend)" with a list of supported backends.
docs/CONFIGURATION.md — document ATLAS_GPU_VENDOR and any
per-vendor env vars (e.g., AMD's HSA_OVERRIDE_GFX_VERSION).
Status (at mirror time): Open (tracking) • Severity: High
One-liner: Build hard-codes NVIDIA/CUDA — AMD/Intel/CPU-only users can't install. Match llama.cpp's full backend matrix
inference/Dockerfile.v31,docker-compose.yml,scripts/install.shThe current build hard-codes NVIDIA/CUDA in three places:
Dockerfile.v31:1— base image isnvidia/cuda:12.9.0-devel-rockylinux9.Dockerfile.v31:25— cmake is invoked with-DGGML_CUDA=ONandno other backend.
docker-compose.yml:28— GPU resource reservation specifiesdriver: nvidia.llama.cpp itself supports a wide backend matrix — CUDA, ROCm/HIP
(AMD), Vulkan (universal), SYCL (Intel), Metal (Apple), and a CPU-only
fallback. ATLAS only exposes one. AMD users currently can't install
the stack at all.
geometric-lens runs PyTorch on CPU
(
geometric_lens/service.py:60), so only the llama-server side needsthe abstraction. The Confidence Router, V3 pipeline, Pattern Cache,
proxy, and sandbox are all vendor-agnostic.
Fix
Two reasonable directions; pragmatic combo recommended:
NVIDIA + AMD + Intel + CPU. Slower than native CUDA/ROCm (typically
20–40% perf hit) but the operational simplicity is huge — no
per-vendor matrix, no driver detection logic. Best fit for ATLAS's
"easy to install" goal.
Dockerfile.v31(CUDA),Dockerfile.v31.rocm(AMD),Dockerfile.v31.vulkan(universal),Dockerfile.v31.cpu(no GPU). Compose picks one viaATLAS_GPU_VENDOR=nvidia|amd|vulkan|cpu. Best raw performance pervendor; more surface area to maintain.
The pragmatic combo: ship (a) first to unbreak non-NVIDIA users,
then add (b) for users who want native-vendor speed. Vulkan stays
as the universal fallback.
Cross-references:
encompasses per-vendor presets once the Dockerfile matrix exists.
atlas-bootstrap.shinstaller — needs to detect GPUvendor and pick the right image variant; currently assumes NVIDIA.
matrix once vendor abstraction lands (e.g.,
ghcr.io/.../atlas-llama-server:v3.1-cuda,:v3.1-rocm,:v3.1-vulkan).Dockerfile.v31.rocm)backends + a per-vendor (b) image option)
Silicon doesn't run our docker stack natively (no nvidia/cuda
base, no Linux containers with GPU passthrough on Mac). Either
a separate Mac-native install path (brew + raw llama.cpp Metal
binary) or wait until vLLM/MLX backend lands via PC-024. Track
here so the GPU-vendor work is aware of it.
AMD-on-Windows and Intel-on-Windows. NVIDIA-on-Windows users
just use the CUDA path. Pure-DirectML support would be its own
backend; defer until there's demand.
Docs to update in the same change:
docs/SETUP.md— vendor selection step in the install flow,per-vendor prerequisite blocks (NVIDIA Container Toolkit vs ROCm
vs Vulkan ICD vs CPU-only).
docs/ARCHITECTURE.md— the llama-server box becomes "llama-server(vendor-pluggable backend)" with a list of supported backends.
docs/CONFIGURATION.md— documentATLAS_GPU_VENDORand anyper-vendor env vars (e.g., AMD's
HSA_OVERRIDE_GFX_VERSION).docs/TROUBLESHOOTING.md— vendor-specific failure-mode entries(AMD ROCm version mismatches, Vulkan ICD missing, etc.).
README.md— system requirements section.Mirrored from local
ISSUES.md. ISSUES.md is gitignored; this issue is the canonical public record. PC-### numbering preserved for cross-references.