Skip to content

Metal SSD-streaming cache-pressure vector test fails at short_code_completion #439

Description

@DanteCpp

Summary

./ds4_test --metal-ssd-streaming-cache-pressure fails with a selected-token mismatch in the short_code_completion vector case.

Environment

  • Machine: Apple M2 Max
  • RAM: 32 GiB
  • macOS: 26.5.1
  • Backend: Metal SSD streaming
  • Model: DeepSeek-V4-Flash-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-chat-v2-imatrix.gguf

Command

make ds4_test

DS4_TEST_MODEL=gguf/DeepSeek-V4-Flash-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-chat-v2-imatrix.gguf \
  ./ds4_test --metal-ssd-streaming-cache-pressure

Result

metal-ssd-streaming-cache-pressure:
ds4-test: Metal SSD streaming cache-pressure repro (16GiB cache, layer-batched decode, short_code_completion)
ds4: Metal device Apple M2 Max, 32.00 GiB RAM
ds4: metal SSD streaming cache budget 16.00 GiB / 6.75 MiB per expert = 2427 experts
ds4: Metal SSD streaming mode enabled; full model residency and warmup are skipped
ds4-test: vector short_code_completion
ds4-test: vector short_code_completion step 1 selected token mismatch
tests/ds4_test.c:873: assertion failed: false
metal-ssd-streaming-cache-pressure: ERR
ds4 tests: 1 failure(s)

Expected

The cache-pressure vector test should pass, or the fixture/test should be updated if this model/backend pairing
is no longer expected to match it.

Notes

This test forces the Metal SSD-streaming layer-batched decode path with a 16 GiB routed expert cache and checks
official-vector parity for short_code_completion. The failure means local greedy selection diverges from the
expected vector token at step 1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions