Report backend GPUs and bundle GPU benchmarks#509
Conversation
5924b3a to
185f5c2
Compare
|
|
||
| pub(crate) fn run_gpus(json_output: bool) -> Result<()> { | ||
| let mut hw = hardware::survey(); | ||
| hardware::augment_gpu_facts_with_vulkan_devices(&mut hw.gpus); |
There was a problem hiding this comment.
@ndizazzo augmenting gpu facts like this was wrong I think?
There was a problem hiding this comment.
Not sure, seems like an okay idea to me. Why do you figure it was wrong?
There was a problem hiding this comment.
This PR prevents me from starting mesh-llm on my dual-GPU system with the error:
./target/release/mesh-llm serve
configured gpu_id 'pci:00000000:01:00.0' could not be resolved because this host has no pinnable GPUs; available pinnable GPU IDs: none: startup model 'unsloth/Qwen3.6-27B-GGUF:UD-Q4_K_XL' failed pinned GPU preflight
Despite the nvidia-smi output:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 595.71.05 Driver Version: 595.71.05 CUDA Version: 13.2 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 5090 On | 00000000:01:00.0 On | N/A |
| 0% 29C P8 8W / 500W | 3877MiB / 32607MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 3080 On | 00000000:06:00.0 Off | N/A |
| 0% 25C P8 10W / 300W | 8774MiB / 10240MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
but on a DEBUG BUILD it works:
./target/release/mesh-llm gpu
⚠️ No GPUs detected on this node.
./target/debug/mesh-llm gpu
ggml_cuda_init: found 2 CUDA devices (Total VRAM: 41954 MiB):
Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes, VRAM: 32077 MiB
Device 1: NVIDIA GeForce RTX 3080, compute capability 8.6, VMM: yes, VRAM: 9877 MiB
🖥️ GPU 0
Name: NVIDIA GeForce RTX 5090
Stable ID: pci:00000000:01:00.0
Backend device: CUDA0
VRAM: 34.2 GB
Bandwidth: 1661.1 GB/s
Unified memory: no
PCI BDF: 00000000:01:00.0
Vendor UUID: GPU-80ded6bd-1a89-2628-3d94-902187dbab1d
🖥️ GPU 1
Name: NVIDIA GeForce RTX 3080
Stable ID: pci:00000000:06:00.0
Backend device: CUDA1
VRAM: 10.7 GB
Bandwidth: 720.2 GB/s
Unified memory: no
PCI BDF: 00000000:06:00.0
Vendor UUID: GPU-6b7fe24c-5f15-4ac5-88d6-c8934135a4ea
debug CPP output still in device detection on Apple:
Release build:
ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.024 sec
ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s)
ggml_metal_device_init: GPU name: MTL0 (Apple M4 Pro)
ggml_metal_device_init: GPU family: MTLGPUFamilyApple9 (1009)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002)
ggml_metal_device_init: simdgroup reduction = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory = true
ggml_metal_device_init: has bfloat = true
ggml_metal_device_init: has tensor = false
ggml_metal_device_init: use residency sets = true
ggml_metal_device_init: use shared buffers = true
ggml_metal_device_init: recommendedMaxWorkingSetSize = 40200.90 MB
🖥️ GPU 0
Name: Apple M4 Pro
Stable ID: metal:0
Backend device: MTL0
VRAM: 51.5 GB
Bandwidth: 199.5 GB/s
Unified memory: yes
debug CPP output still in device detection on Linux:
Release build:
ggml_cuda_init: found 2 CUDA devices (Total VRAM: 41954 MiB):
Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes, VRAM: 32077 MiB
Device 1: NVIDIA GeForce RTX 3080, compute capability 8.6, VMM: yes, VRAM: 9877 MiB
...
| use std::ffi::CStr; | ||
|
|
||
| #[derive(Clone, Debug, Default)] | ||
| struct NvidiaDeviceInfo { |
There was a problem hiding this comment.
Possible to obtain the CUDA identifier here? CUDA0, or CUDA1?
185f5c2 to
cad1bd0
Compare
630f8e4 to
2255199
Compare
|
Added some additional debugging tools worth keeping to this when I was tracing model loading errors: |
|
Merging after #513 |
0b0f98e to
3352e60
Compare
|
@ndizazzo thanks for picking this up!! |
* origin/main: Trim skippy decode overhead (#537) chore(docs): update docs to match planned updates Split skippy-server frontend module (#536) Add Skippy WAN Docker lab (#528) Instrument Skippy binary transport timing (#533) Report backend GPUs and bundle GPU benchmarks (#509) Fuse warm chat prefix restore with first decode (#527) Improve skippy prompt layer-package tokenizer handling (#530) Use SQLite for metrics server storage (#529)
Summary
skippy/devices.hABI plus sharedskippy/common.hstatus/error types, then exposes backend device data throughskippy-ffiandskippy-runtimenvidia-smi; Linux skippy-enabled survey can still discover NVIDIA GPUs through SDK libraries when skippy reports no GPU devicesmesh-llm-gpu-benchcrate and compiles benchmark backends intomesh-llminstead of discoveringmembench-fingerprint*helper executables00000000:00:00.0, resolving UUID aliases, and accepting the single available pinnable GPU for legacy single-GPU pinshardware/mod.rs,hardware/parsers.rs,hardware/tests.rs,hardware/skippy_devices.rs, andhardware/enrichers.rsArchitecture
mesh-llm-systemkeeps survey, cache, fingerprint, and pinned-GPU policy.mesh-llm-gpu-benchowns native benchmark backend selection and execution.Protocol
Testing
cargo fmt --all -- --checkcargo check -p mesh-llm-gpu-benchcargo check -p mesh-llm-systemcargo check -p mesh-llmcargo test -p mesh-llm-system benchmark --libcargo test -p mesh-llm-system hardware --libjust build-devmesh-llm gpus --json: byte-for-byte identicalmesh-llm gpus --jsonafter log suppression: empty stderr and valid GPU JSONtarget/debug/mesh-llm gpus benchmarkrefreshed one GPU fingerprint without anymembench-fingerprint*helper binary presentwhite.localCUDA before/after: hardware fields match; backend label intentionally changes from the old incorrectVulkan0overlay toCUDA0white.localCUDA benchmark crate compile:cargo check -p mesh-llm-gpu-bench --features cudawhite.localCUDA runtime build:PATH=/usr/lib/llvm-21/bin:$PATH just build-runtime cudawhite.localcompiled-in CUDA benchmark:target/debug/mesh-llm gpus benchmarkrefreshed one GPU fingerprint at 908.7 GB/s with nomembench-fingerprint*helper binary presentwhite.localCUDAmesh-llm gpus --json: reportsbackend_device: CUDA0, stable PCI ID, vendor UUID, and no helper binaries intargetwhite.localVulkan before/after: byte-for-byte identicalwhite.localCPU-linked release build:mesh-llm gpus --jsonreports NVIDIA GPU via CUDA/NVML SDK discovery and emits empty stderr; scratch/build work used$HOME/tmp/mesh-llm-pr509-reviewNotes
389ff61d77b5c71cec0cf92fe4e5d01ace80b797.white.local; HIP and Intel still need runtime validation on machines with those SDK toolchains available.