perf(download): cache asset verification by stat signature#25
Conversation
prepare_binaries and ensure_file SHA-256 every asset on every call; for the ArcBox daemon that meant re-hashing ~230MB of runtime binaries (~400ms) on each boot. Record (size, mtime, sha256) in a .verified.json next to the assets after a successful verification or download, and trust it while the stat signature and expected digest are unchanged. Missing/corrupt cache degrades to a full re-hash. Also stream sha256_file in 1MiB chunks instead of reading whole files into memory. Bump to 0.5.2.
There was a problem hiding this comment.
Important
The cache trades away boot-time tamper detection for the security-critical kernel and rootfs. Please confirm this matches the intended threat model before merging — the change is otherwise clean and well-tested.
Reviewed changes — a stat-signature verification cache so steady-state boots skip re-hashing already-verified assets, plus a streaming hash and a patch release bump.
- Add
VerifyCache(src/verify_cache.rs) — records(sha256, size, mtime)per asset in.verified.json;is_verifiedtrusts the recorded digest only while both the expected sha256 and the stat signature match, otherwise the caller falls back to a full re-hash. Load tolerates a missing/corrupt file, save is atomic and dirty-gated. - Thread the cache through
ensure_file(src/asset_manager.rs) —prepare()loads once, passes&mut cachefor kernel and rootfs, andsave()s at the end; the prior exists-and-rehash path now also records. - Thread the cache through
prepare_binaries(src/download.rs) — same pattern with its own cache indest_dir. - Stream
sha256_filein 1 MiB chunks — replacesfs::readof the whole file to cap peak memory on hundred-MB assets. - Release plumbing —
filetime/tempfiledev-deps,filetime0.2.27→0.2.29 inCargo.lock(dropslibredox/redox_syscall/plaintransitives), crate 0.5.1→0.5.2.
⚠️ Cache removes boot-time tamper detection for kernel and rootfs
Without the cache, every boot re-hashes each asset and compares it to the manifest digest, so on-disk corruption or tampering of the kernel/rootfs is caught before the asset is used. With the cache, a file modified in place while keeping its size and mtime is trusted on the strength of a stale .verified.json entry, with no re-hash. Preserving mtime is trivial (touch -d/filetime), and the attacker need not touch the cache file at all — the old entry already matches the manifest digest.
For a pure download-integrity goal (catch corrupt/partial fetches at fetch time) this is fine, and the perf win is real. The open question is whether re-verifying security-critical boot assets on every boot was a deliberate defense-in-depth property you intend to keep.
Technical details
# Cache removes boot-time tamper detection for kernel and rootfs
## Affected sites
- `src/verify_cache.rs:65` — `is_verified` returns true on (expected sha256 + size + mtime) match without reading file contents.
- `src/asset_manager.rs` `ensure_file` — kernel/rootfs now short-circuit on `is_verified`.
- `src/download.rs` `prepare_binaries` — runtime binaries short-circuit on `is_verified`.
## Required outcome
- An explicit decision on whether boot-time re-verification of kernel/rootfs is in-scope for the threat model, documented in the PR or module docs.
## Suggested approach (optional)
- If boot-time tamper detection matters: keep the cache for the bulk runtime binaries but force a full re-hash for `kernel`/`rootfs` (or gate caching of those two behind a flag), since they are the highest-value targets and only two files.
- If it does not matter (assets live in a trusted, integrity-protected store): no code change needed — just record the accepted tradeoff so it isn't silently reintroduced as a "bug" later.
## Open questions for the human
- What is the trust boundary on the cache directory? If anything that can write the asset can also write `.verified.json`, re-hashing still mattered (the attacker cannot forge a manifest-matching digest), so the regression is real regardless of cache-file integrity.ℹ️ Concurrent prepare runs can drop cache entries
Two processes preparing the same directory each load, mutate, and atomically rename their own copy of .verified.json, so the last writer wins and the other's newly recorded entries are lost. The atomic rename prevents a corrupt file, and a lost entry only costs a re-hash on the next run, so this is mergeable as-is — noting it so the last-writer-wins behavior is a known property rather than a surprise.
Claude Opus | 𝕏

prepare_binariesandensure_fileSHA-256 every asset on every call. For the ArcBox daemon that meant re-hashing ~230MB of runtime binaries (~410ms measured) on every single boot, plus kernel/rootfs on theprepare()path.Change
After a successful verification or download, record
(size, mtime, sha256)per file in a.verified.jsonnext to the assets. Subsequent calls trust the recorded digest while the stat signature and the expected digest both match — one stat per asset instead of a full hash. Any mismatch (file touched, manifest updated, cache missing/corrupt) falls back to the full re-hash; the cache is written atomically and is purely an optimization.Also streams
sha256_filein 1MiB chunks instead of slurping whole files into memory (assets run to hundreds of MB).Measured in arcbox (isolated daemon, warm boot): daemon-start → VMM-start dropped 434ms → 4ms once the cache is populated.
Release
Bumped to 0.5.2. After merge this needs a crates.io publish, then arcbox bumps
arcbox-boot = "0.5.2"(arcboxlabs/arcbox#301 references this).