Skip to content

BinaryMemoTable.Exists causes heap allocation per call due to closure escape #736

@laskoviymishka

Description

@laskoviymishka

Describe the enhancement requested

Found this during research around apache/iceberg-go#818 (comment)
The is_in kernel for binary types allocates once per input element because the []byte value escapes to the heap through a closure chain. This makes is_in on binary arrays ~2x slower than a manual map[string]struct{} lookup.

The call chain in the is_in hot path:

  1. visitBinary slices rawBytes[offsets[pos]:offsets[pos+1]] and passes it to valid(slice) callback
  2. The callback in isInKernelExec calls state.Lookup.Exists(v)
  3. BinaryMemoTable.Exists calls b.lookup(hash, val)
  4. lookup calls b.tbl.Lookup(h, func(i int32) bool { return bytes.Equal(val, ...) })

The closure at step 4 captures val []byte. Since it's passed to HashTable.Lookup (which takes cmp func(T) bool), Go's escape analysis conservatively moves val to the heap. This propagates up — the slice created at step 1 also escapes.

Result: 1 heap allocation per input element, regardless of whether the value exists in the set.

Evidence

Profiling is_in on a 1M-element Binary array with a 10-element value set:

255203025 94.92%  BinaryMemoTable.Exists    (flat alloc_objects)
 10726303  3.99%  BinaryMemoTable.InsertOrGet

~1M allocs from Exists alone, matching the input size exactly.

Validated Fix

Using the noescape unsafe.Pointer trick (same pattern as Go runtime's runtime/stubs.go):

func noescape(p unsafe.Pointer) unsafe.Pointer {
    x := uintptr(p)
    return unsafe.Pointer(x ^ 0 ^ 0)
}

func noescapeBytes(val []byte) []byte {
    p := noescape(unsafe.Pointer(unsafe.SliceData(val)))
    return unsafe.Slice((*byte)(p), len(val))
}

Combined with isInBinaryDirect that iterates the binary array directly (bypassing visitBinary/VisitBitBlocksShort closure chain) and calls ExistsDirect which inlines the hash probe loop:

Before: 1,000,231 allocs/op at 1M rows
After: 225 allocs/op at 1M rows (4,445x reduction)

Speed improved ~10% (51ms -> 35ms for 1M binary keys against 10-element value set).

Suggested Fix Options

  1. noescape trick (validated, minimal change) — add ExistsDirect to BinaryMemoTable + isInBinaryDirect to the kernel
  2. Restructure HashTable.Lookup to avoid the cmp func(T) bool closure — e.g., add LookupEqual(hash uint64, val T, eq func(a, b T) bool) where the comparator is a package-level function (won't capture locals)
  3. Change BinaryMemoTable.Exists signature to accept (buf []byte, offset, length int) to avoid sub-slicing

Reproduction

Any is_in call with Binary type input will show ~1 alloc per element in benchmarks.

Component(s)

Benchmarking

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions