Summary
./ds4_test --metal-ssd-streaming-cache-pressure fails with a selected-token mismatch in the short_code_completion vector case.
Environment
- Machine: Apple M2 Max
- RAM: 32 GiB
- macOS: 26.5.1
- Backend: Metal SSD streaming
- Model:
DeepSeek-V4-Flash-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-chat-v2-imatrix.gguf
Command
make ds4_test
DS4_TEST_MODEL=gguf/DeepSeek-V4-Flash-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-chat-v2-imatrix.gguf \
./ds4_test --metal-ssd-streaming-cache-pressure
Result
metal-ssd-streaming-cache-pressure:
ds4-test: Metal SSD streaming cache-pressure repro (16GiB cache, layer-batched decode, short_code_completion)
ds4: Metal device Apple M2 Max, 32.00 GiB RAM
ds4: metal SSD streaming cache budget 16.00 GiB / 6.75 MiB per expert = 2427 experts
ds4: Metal SSD streaming mode enabled; full model residency and warmup are skipped
ds4-test: vector short_code_completion
ds4-test: vector short_code_completion step 1 selected token mismatch
tests/ds4_test.c:873: assertion failed: false
metal-ssd-streaming-cache-pressure: ERR
ds4 tests: 1 failure(s)
Expected
The cache-pressure vector test should pass, or the fixture/test should be updated if this model/backend pairing
is no longer expected to match it.
Notes
This test forces the Metal SSD-streaming layer-batched decode path with a 16 GiB routed expert cache and checks
official-vector parity for short_code_completion. The failure means local greedy selection diverges from the
expected vector token at step 1.
Summary
./ds4_test --metal-ssd-streaming-cache-pressurefails with a selected-token mismatch in theshort_code_completionvector case.Environment
DeepSeek-V4-Flash-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-chat-v2-imatrix.ggufCommand
Result
Expected
The cache-pressure vector test should pass, or the fixture/test should be updated if this model/backend pairing
is no longer expected to match it.
Notes
This test forces the Metal SSD-streaming layer-batched decode path with a 16 GiB routed expert cache and checks
official-vector parity for short_code_completion. The failure means local greedy selection diverges from the
expected vector token at step 1.