Background
As the good balance of speed & granularity, I am running a Disk KV configuration of 4096 tokens (--kv-cache-continued-interval-tokens 4096, --kv-cache-boundary-align-tokens 4096) backing OpenClaw. DS4 writes a new disk checkpoint every 4,096 tokens during prefill:
0–4,096 tokens → one .kv file (~76 MiB)
0–8,192 tokens → one .kv file (~130 MiB)
0–12,288 tokens → one .kv file (~184 MiB)
Issue
I was running a 256Gb Disk KV Cache store limit which I ran out of today. When the disk budget runs out, DS4 must evict files to make room. The eviction score formula is:
score = (effective_hits + 1.0) * tokens / file_size
When there are no hits, tokens / file_size is token density — tokens saved per byte of disk used. Because all KV files for the same model have roughly the same per-token size, density increases slightly with file size due to fixed header overhead. Small files always score lower than large files and are evicted first.
This creates a self-defeating cycle when disk KV store hits the limit:
- New turn arrives. DS4 starts prefilling a ~35,000-token prompt.
- At each 4,096-token boundary it writes a small checkpoint file.
- Disk is full → the small checkpoint (76 MiB, 4,096 tokens) is immediately evicted to make room for the next larger one.
- By the end of prefill, only large files survive (e.g. the 481 MiB full-context snapshot).
- Next turn: the new prompt differs slightly from the saved one — the client strips metadata from old messages as they become history, so the rendered text diverges around token ~34,000 out of ~35,000.
- DS4 checks disk. The only surviving file is the 481 MiB snapshot, but its stored text extends past the divergence point, so the SHA1 doesn't match.
- The smaller files (e.g. 0–32,768 tokens) would have matched, since their text falls entirely within the stable prefix — but they were evicted in step 3.
- Full re-prefill from scratch: ~100–200 seconds instead of ~8 seconds.
Temporary Workaround
I nuked the kv store & started with a fresh 400 Gb store to avoid these eviction issues. This will again stop working when KV store hits its limit but at least now I know what to monitor.
Pre-Post Results
| State |
Turn-start time |
Disk KV loaded |
| Fresh disk / well under budget |
~8–12 s |
32,768 tokens from disk ✓ |
| Disk at capacity (256 GB, 441 stale files) |
100–208 s |
0 ✗ |
| After nuke + budget increase to 400 GB |
~8–12 s |
32,768 tokens from disk ✓ |
Long Term Fix needed
We need to revisit the kv-eviction score formula to re-prioritize what should be evicted. I will also try experimenting with some options.
Background
As the good balance of speed & granularity, I am running a Disk KV configuration of 4096 tokens (--kv-cache-continued-interval-tokens 4096, --kv-cache-boundary-align-tokens 4096) backing OpenClaw. DS4 writes a new disk checkpoint every 4,096 tokens during prefill:
Issue
I was running a 256Gb Disk KV Cache store limit which I ran out of today. When the disk budget runs out, DS4 must evict files to make room. The eviction score formula is:
When there are no hits,
tokens / file_sizeis token density — tokens saved per byte of disk used. Because all KV files for the same model have roughly the same per-token size, density increases slightly with file size due to fixed header overhead. Small files always score lower than large files and are evicted first.This creates a self-defeating cycle when disk KV store hits the limit:
Temporary Workaround
I nuked the kv store & started with a fresh 400 Gb store to avoid these eviction issues. This will again stop working when KV store hits its limit but at least now I know what to monitor.
Pre-Post Results
Long Term Fix needed
We need to revisit the kv-eviction score formula to re-prioritize what should be evicted. I will also try experimenting with some options.