cuVS Index Deserialization: Integer Overflow, Type Confusion, and Allocation Bombs Across All Index Types
Description
Seven execution-verified bugs across cuVS index serialization/deserialization code. (1) Heap-buffer-overflow in buffered_ofstream::write when size exceeds buffer capacity. (2) Missing break in switch causes fallthrough from kSerializeStridedDataset to kSerializeVPQDataset, causing type confusion on invalid cudaDataType_t. (3) Integer overflow in CAGRA deserialize: n_rows * graph_degree product overflows, causing small allocation followed by large read — affects ALL index deserializers (brute_force, IVF-Flat, IVF-PQ, CAGRA). (4) Enum values deserialized without validation — invalid DistanceType/codebook_gen/list_layout values are UB in C++. (5) OOM via n_lists=0xFFFFFFFF in IVF-PQ/IVF-Flat deserialize. (6) rowsdimsizeof(T) overflow plus size_t-to-int64_t narrowing in brute_force deserialize. All confirmed by dynamic analysis with PoC files.
Exploit Scenario
A vector search service loads user-uploaded index files. An attacker crafts an index with n_rows=0xFFFFFFFFFFFFFFFF and dim=2. The multiplication overflows to a small value, allocating a tiny buffer. The subsequent stream read writes the full (large) payload into the undersized buffer, achieving heap corruption and potentially arbitrary code execution.
cuVS Index Deserialization: Integer Overflow, Type Confusion, and Allocation Bombs Across All Index Types
Description
Seven execution-verified bugs across cuVS index serialization/deserialization code. (1) Heap-buffer-overflow in buffered_ofstream::write when size exceeds buffer capacity. (2) Missing break in switch causes fallthrough from kSerializeStridedDataset to kSerializeVPQDataset, causing type confusion on invalid cudaDataType_t. (3) Integer overflow in CAGRA deserialize: n_rows * graph_degree product overflows, causing small allocation followed by large read — affects ALL index deserializers (brute_force, IVF-Flat, IVF-PQ, CAGRA). (4) Enum values deserialized without validation — invalid DistanceType/codebook_gen/list_layout values are UB in C++. (5) OOM via n_lists=0xFFFFFFFF in IVF-PQ/IVF-Flat deserialize. (6) rowsdimsizeof(T) overflow plus size_t-to-int64_t narrowing in brute_force deserialize. All confirmed by dynamic analysis with PoC files.
Exploit Scenario
A vector search service loads user-uploaded index files. An attacker crafts an index with n_rows=0xFFFFFFFFFFFFFFFF and dim=2. The multiplication overflows to a small value, allocating a tiny buffer. The subsequent stream read writes the full (large) payload into the undersized buffer, achieving heap corruption and potentially arbitrary code execution.