Skip to content

Compile time dispatch for Goldilocks SIMD optimizations#96

Closed
z-tech wants to merge 52 commits into
mainfrom
z-tech/simd_goldilocks_experimental
Closed

Compile time dispatch for Goldilocks SIMD optimizations#96
z-tech wants to merge 52 commits into
mainfrom
z-tech/simd_goldilocks_experimental

Conversation

@z-tech
Copy link
Copy Markdown
Collaborator

@z-tech z-tech commented Apr 2, 2026

Avx-512 with IFMA

Multilinear Sumcheck

Goldilocks
Input size SIMD (AVX-512) Generic Speedup
2^16 106 µs 624 µs 5.9x
2^18 460 µs 1.82 ms 4.0x
2^20 1.65 ms 6.83 ms 4.1x
2^24 39.7 ms 139.8 ms 3.5x
Goldilocks ext 2
Input size SIMD (AVX-512) Generic Speedup
2^16 456 µs 1.23 ms 2.7x
2^18 2.89 ms 4.27 ms 1.48x
2^20 11.5 ms 15.9 ms 1.38x
2^24 234 ms 343 ms 1.47x
Goldilocks ext 3
Input size SIMD (AVX-512) Generic Speedup
2^16 828 µs 1.97 ms 2.4x
2^18 4.48 ms 7.24 ms 1.6x
2^20 19.8 ms 28.3 ms 1.43x
2^24 369 ms 616 ms 1.67x

Inner product sumcheck

Goldilocks
Size SIMD (AVX-512) Generic Speedup
2^16 300 µs 1.32 ms 4.4x
2^18 1.26 ms 5.98 ms 4.8x
2^20 4.84 ms 23.8 ms 4.9x
2^24 104 ms 484 ms 4.7x
Goldilocks ext 2
Size SIMD (AVX-512) Generic Speedup
2^16 2.35 ms 4.29 ms 1.82x
2^18 9.99 ms 17.6 ms 1.76x
2^20 44.3 ms 72.7 ms 1.64x
2^24 724 ms 1297 ms 1.79x
Goldilocks ext 3
Size SIMD (AVX-512) Generic Speedup
2^16 4.24 ms 8.65 ms 2.04x
2^18 18.7 ms 36.1 ms 1.92x
2^20 84.2 ms 146 ms 1.73x
2^24 1.33 s 2.48 s 1.87x

@z-tech z-tech force-pushed the z-tech/simd_goldilocks_experimental branch 2 times, most recently from 276530a to 2cf1b9b Compare April 7, 2026 13:53
@z-tech z-tech changed the title Z tech/simd goldilocks experimental Vectorized Path Autodispatch for Goldilocks Apr 8, 2026
@z-tech z-tech force-pushed the z-tech/simd_goldilocks_experimental branch from a312538 to ec8dcd6 Compare April 9, 2026 06:50
@z-tech z-tech mentioned this pull request Apr 9, 2026
@z-tech z-tech changed the title Vectorized Path Autodispatch for Goldilocks Compile time dispatch for Goldilocks SIMD optimizations Apr 10, 2026
EC2 Default User and others added 28 commits April 10, 2026 15:15
Adds rayon-backed parallel SoA reduce kernels for ext2/ext3 sumcheck and
inner-product sumcheck, dispatched above a 2^17 pair threshold. Includes
parity tests for the parallel path and extended benchmarks at 2^22.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Adds fused_fold_and_compute_polynomial: single-pass kernel that folds
  a/b and computes (c0, c2) together in one sweep. Parity-tested vs the
  faithful port for pow2 and non-pow2 inputs.
- Exposes whir_sumcheck_fused / _with_hook / _partial_with_hook siblings.
- Microbench now compares effsc SIMD vs whir-faithful vs whir-fused at
  2^20..=2^24. Fused is ~1.3-2x over the faithful port and ~2.5-4x
  over effsc's existing SIMD inner_product_sumcheck on Goldilocks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a second section to whir_port_micro comparing the three variants
on F64Ext3 (24-byte elements). Two takeaways from the first run:
- Fused keeps a ~20-60% lead over the faithful port in ext3 — the
  single-pass kernel still saves memory traffic even when element-size
  inflates the compute share.
- effsc SIMD craters in ext3 (5-7x slower than fused at 2^18-2^22);
  the SIMD path is Goldilocks-specific and falls back to scalar under
  cubic-ext arithmetic.

Caveat: canonical WHIR cross-field (BF=F64 evals, EF=F64Ext3 chals)
is not benched here — the port is monomorphic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fused's lead over the faithful port shrinks in large ext3 (0.88-0.93x
at 2^23-2^24) — element size inflates the compute share, so memory
fusion's bandwidth win matters less. For ext3 the dominant cost is
cubic-extension multiplication, not traffic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Drop the effsc SIMD column from the comparison — the signal we care
about now is fused-vs-faithful, not the effsc baseline.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@z-tech
Copy link
Copy Markdown
Collaborator Author

z-tech commented Apr 19, 2026

this is addressed in: #99

@z-tech z-tech closed this Apr 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant