Compile time dispatch for Goldilocks SIMD optimizations#96
Closed
z-tech wants to merge 52 commits into
Closed
Conversation
276530a to
2cf1b9b
Compare
a312538 to
ec8dcd6
Compare
Adds rayon-backed parallel SoA reduce kernels for ext2/ext3 sumcheck and inner-product sumcheck, dispatched above a 2^17 pair threshold. Includes parity tests for the parallel path and extended benchmarks at 2^22. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Adds fused_fold_and_compute_polynomial: single-pass kernel that folds a/b and computes (c0, c2) together in one sweep. Parity-tested vs the faithful port for pow2 and non-pow2 inputs. - Exposes whir_sumcheck_fused / _with_hook / _partial_with_hook siblings. - Microbench now compares effsc SIMD vs whir-faithful vs whir-fused at 2^20..=2^24. Fused is ~1.3-2x over the faithful port and ~2.5-4x over effsc's existing SIMD inner_product_sumcheck on Goldilocks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a second section to whir_port_micro comparing the three variants on F64Ext3 (24-byte elements). Two takeaways from the first run: - Fused keeps a ~20-60% lead over the faithful port in ext3 — the single-pass kernel still saves memory traffic even when element-size inflates the compute share. - effsc SIMD craters in ext3 (5-7x slower than fused at 2^18-2^22); the SIMD path is Goldilocks-specific and falls back to scalar under cubic-ext arithmetic. Caveat: canonical WHIR cross-field (BF=F64 evals, EF=F64Ext3 chals) is not benched here — the port is monomorphic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fused's lead over the faithful port shrinks in large ext3 (0.88-0.93x at 2^23-2^24) — element size inflates the compute share, so memory fusion's bandwidth win matters less. For ext3 the dominant cost is cubic-extension multiplication, not traffic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Drop the effsc SIMD column from the comparison — the signal we care about now is fused-vs-faithful, not the effsc baseline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Collaborator
Author
|
this is addressed in: #99 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Avx-512 with IFMA
Multilinear Sumcheck
Goldilocks
Goldilocks ext 2
Goldilocks ext 3
Inner product sumcheck
Goldilocks
Goldilocks ext 2
Goldilocks ext 3