-
Notifications
You must be signed in to change notification settings - Fork 105
BinaryMemoTable.Exists causes heap allocation per call due to closure escape #736
Description
Describe the enhancement requested
Found this during research around apache/iceberg-go#818 (comment)
The is_in kernel for binary types allocates once per input element because the []byte value escapes to the heap through a closure chain. This makes is_in on binary arrays ~2x slower than a manual map[string]struct{} lookup.
The call chain in the is_in hot path:
visitBinaryslicesrawBytes[offsets[pos]:offsets[pos+1]]and passes it tovalid(slice)callback- The callback in
isInKernelExeccallsstate.Lookup.Exists(v) BinaryMemoTable.Existscallsb.lookup(hash, val)lookupcallsb.tbl.Lookup(h, func(i int32) bool { return bytes.Equal(val, ...) })
The closure at step 4 captures val []byte. Since it's passed to HashTable.Lookup (which takes cmp func(T) bool), Go's escape analysis conservatively moves val to the heap. This propagates up — the slice created at step 1 also escapes.
Result: 1 heap allocation per input element, regardless of whether the value exists in the set.
Evidence
Profiling is_in on a 1M-element Binary array with a 10-element value set:
255203025 94.92% BinaryMemoTable.Exists (flat alloc_objects)
10726303 3.99% BinaryMemoTable.InsertOrGet
~1M allocs from Exists alone, matching the input size exactly.
Validated Fix
Using the noescape unsafe.Pointer trick (same pattern as Go runtime's runtime/stubs.go):
func noescape(p unsafe.Pointer) unsafe.Pointer {
x := uintptr(p)
return unsafe.Pointer(x ^ 0 ^ 0)
}
func noescapeBytes(val []byte) []byte {
p := noescape(unsafe.Pointer(unsafe.SliceData(val)))
return unsafe.Slice((*byte)(p), len(val))
}Combined with isInBinaryDirect that iterates the binary array directly (bypassing visitBinary/VisitBitBlocksShort closure chain) and calls ExistsDirect which inlines the hash probe loop:
Before: 1,000,231 allocs/op at 1M rows
After: 225 allocs/op at 1M rows (4,445x reduction)
Speed improved ~10% (51ms -> 35ms for 1M binary keys against 10-element value set).
Suggested Fix Options
noescapetrick (validated, minimal change) — addExistsDirecttoBinaryMemoTable+isInBinaryDirectto the kernel- Restructure
HashTable.Lookupto avoid thecmp func(T) boolclosure — e.g., addLookupEqual(hash uint64, val T, eq func(a, b T) bool)where the comparator is a package-level function (won't capture locals) - Change
BinaryMemoTable.Existssignature to accept(buf []byte, offset, length int)to avoid sub-slicing
Reproduction
Any is_in call with Binary type input will show ~1 alloc per element in benchmarks.
Component(s)
Benchmarking