Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
618a3c5
simd goldilocks
z-tech Mar 26, 2026
092f2bd
benches
z-tech Mar 26, 2026
dce7ebf
auto dispatch
z-tech Mar 26, 2026
960907d
tweaks
z-tech Mar 26, 2026
799f23b
opts
z-tech Mar 27, 2026
5608be1
cleanup
z-tech Apr 6, 2026
23b59ce
cleanup
z-tech Apr 6, 2026
01fa7f5
cleanup
z-tech Apr 6, 2026
a86421d
loop unrolling, strategy selection based on input size
z-tech Apr 6, 2026
2cf1b9b
fmt and clippy
z-tech Apr 6, 2026
61c61a7
avx
Apr 7, 2026
7df2722
clippy
Apr 7, 2026
432e786
chkpt
Apr 7, 2026
54d0f24
2 coefficient round messages for inner product
z-tech Apr 8, 2026
791bc29
fmt
z-tech Apr 8, 2026
913b26d
inner product dispatch
z-tech Apr 8, 2026
ec8dcd6
refactor coefficient sumcheck
z-tech Apr 9, 2026
3f307b4
opt when bf == ef
z-tech Apr 9, 2026
9469ef6
coeff opts
z-tech Apr 9, 2026
bd1a176
opt protogalaxy fold
z-tech Apr 9, 2026
a113ad1
poly ops
z-tech Apr 9, 2026
ae6cf33
fix bug
z-tech Apr 9, 2026
bcdd3b5
clippy and changelog
z-tech Apr 9, 2026
e8ca8b3
chkpt
z-tech Apr 10, 2026
880122c
chkpt
Apr 10, 2026
6b30473
chkpt w/ deep research
z-tech Apr 11, 2026
c555c78
inner product ext field support + benchmarks + deep research optimiza…
z-tech Apr 11, 2026
fda0fd7
checkpoint
z-tech Apr 12, 2026
7233504
parallel rayon SoA reduce + IP ext3 dispatch + bench extensions
Apr 12, 2026
e9bf170
dispatch avx
z-tech Apr 12, 2026
3e614f9
chkpt
z-tech Apr 12, 2026
3ebcef5
hook
z-tech Apr 13, 2026
21b2729
more support
z-tech Apr 13, 2026
1bdf532
chkpt
z-tech Apr 13, 2026
139589a
more integration
z-tech Apr 13, 2026
432744c
chkpt
z-tech Apr 13, 2026
9d44943
chkpt
z-tech Apr 14, 2026
b525d92
chkpt
z-tech Apr 14, 2026
711f22b
chkpt
z-tech Apr 14, 2026
e1478a2
chkpt: fused fold+compute in whir port; 3-way microbench
Apr 16, 2026
fb7446e
chkpt: microbench covers Goldilocks³ (F64Ext3) alongside F64
Apr 16, 2026
426e7de
chkpt: extend ext3 bench to 2^24
Apr 16, 2026
4577d91
chkpt: slim microbench to whir-port vs whir-fused only
Apr 16, 2026
a5f1aea
chng to msb
Apr 16, 2026
4e1298d
clippy
z-tech Apr 16, 2026
c1e73c3
clippy
z-tech Apr 16, 2026
d85aa24
cleanup
z-tech Apr 17, 2026
04343dd
gitignore
z-tech Apr 17, 2026
5fde362
msb fold
z-tech Apr 17, 2026
4df09fc
fix ci
z-tech Apr 17, 2026
e4acb38
cleanup
z-tech Apr 17, 2026
82e9911
update patch
z-tech Apr 17, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@
**/lag-poly-benches/target/
.vscode
.DS_Store
.claude/
12 changes: 10 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,16 @@ All notable changes to this project will be documented in this file.
## [Unreleased]

### Added
- **Base/Extension field support**: `multilinear_sumcheck` and `inner_product_sumcheck` now take two type parameters `<BF, EF>` — base field for evaluations, extension field for challenges. Set `EF = BF` when no extension is needed.
- `pairwise::cross_field_reduce` — parallel helper for folding `BF` evaluations with an `EF` challenge.
- **SIMD auto-dispatch** for Goldilocks (NEON + AVX-512 IFMA) across all three sumcheck variants.
- **`poly_ops` module** — zero-allocation polynomial arithmetic on coefficient slices.
- **`RoundPolyEvaluator` trait** for `coefficient_sumcheck` — user implements per-pair math, library handles iteration, parallelism, and reductions.
- **Base/Extension field support** (`<BF, EF>`) for `multilinear_sumcheck` and `inner_product_sumcheck`.

### Changed
- **Inner product sumcheck**: 2 prover messages per round instead of 3 (verifier derives the third).
- **Coefficient sumcheck**: sends d coefficients per round instead of d+1.
- **`protogalaxy::fold`**: rewritten with flat buffers (93× faster at scale).
- **`coefficient_sumcheck`** takes `&impl RoundPolyEvaluator<F>` instead of a closure.

## [0.0.2] - 2026-02-11

Expand Down
13 changes: 12 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ ark-std ="0.5.0"
memmap2 = "0.9.5"
nohash-hasher = "0.2.0"
rayon = { version = "1.10", optional = true }
spongefish = { git = "https://github.com/arkworks-rs/spongefish", branch = "main", features = ["ark-ff"] }
spongefish = { git = "https://github.com/z-tech/spongefish.git", branch = "smallfp-support", features = ["ark-ff"] }

[dev-dependencies]
criterion = "0.8"
Expand All @@ -33,3 +33,14 @@ parallel = [
name = "provers"
path = "benches/provers.rs"
harness = false

[[bench]]
name = "simd_vs_generic"
path = "benches/simd_vs_generic.rs"
harness = false

[patch.crates-io]
ark-ff = { git = "https://github.com/arkworks-rs/algebra.git", branch = "master" }
ark-poly = { git = "https://github.com/arkworks-rs/algebra.git", branch = "master" }
ark-serialize = { git = "https://github.com/arkworks-rs/algebra.git", branch = "master" }
spongefish = { git = "https://github.com/z-tech/spongefish.git", branch = "smallfp-support" }
87 changes: 65 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,31 +55,45 @@ let sumcheck_transcript: ProductSumcheck<EF> = inner_product_sumcheck::<BF, EF>(
claim = \sum_{x \in \{0,1\}^n} p(x), \quad \deg_{x_i}(p) \leq d
```

Unlike the multilinear and inner product variants where `p` is multilinear (degree 1 in each variable, yielding degree-1 round polynomials), `coefficient_sumcheck` handles polynomials with arbitrary per-variable degree `d`, producing degree-`d` round polynomials. The user supplies a closure `compute_round_poly` that computes each round polynomial; the library handles transcript interaction and table reductions (both pairwise and tablewise) automatically.
Unlike the multilinear and inner product variants where `p` is multilinear (degree 1 in each variable, yielding degree-1 round polynomials), `coefficient_sumcheck` handles polynomials with arbitrary per-variable degree `d`, producing degree-`d` round polynomials. The user implements `RoundPolyEvaluator` to define how a single pair of even/odd rows contributes to the round polynomial; the library handles iteration, parallelism, transcript interaction, and table reductions automatically.

```rust
use efficient_sumcheck::coefficient_sumcheck::{coefficient_sumcheck, CoefficientSumcheck};
use efficient_sumcheck::coefficient_sumcheck::{
coefficient_sumcheck, CoefficientSumcheck, RoundPolyEvaluator,
};
use efficient_sumcheck::transcript::SanityTranscript;
use ark_poly::univariate::DensePolynomial;

struct MyEvaluator;
impl RoundPolyEvaluator<F> for MyEvaluator {
fn degree(&self) -> usize { 1 }

fn accumulate_pair(
&self,
coeffs: &mut [F], // pre-zeroed buffer of length degree + 1
tw: &[(&[F], &[F])], // (even_row, odd_row) per tablewise table
pw: &[(F, F)], // (even, odd) per pairwise table
) {
let (even, odd) = pw[0];
coeffs[0] += even; // add to constant coefficient
coeffs[1] += odd - even; // add to linear coefficient
}
}

let mut tablewise: Vec<Vec<Vec<F>>> = /* multi-column tables */;
let mut pairwise: Vec<Vec<F>> = /* flat evaluation vectors */;
let mut transcript = SanityTranscript::new(&mut rng);

let result: CoefficientSumcheck<F> = coefficient_sumcheck(
|tablewise, pairwise| {
// Compute h(X) as a DensePolynomial<F> from current table state.
// Return coefficients in ascending order: [c0, c1, ..., cd].
DensePolynomial::from_coefficients_vec(vec![/* ... */])
},
&MyEvaluator,
&mut tablewise,
&mut pairwise,
n_rounds,
&mut transcript,
);
```

The closure receives immutable references to the current tables; after each round the library automatically reduces all pairwise and tablewise entries by folding with the verifier challenge.
The evaluator receives one pair of rows at a time; the library iterates over all pairs (in parallel when the `parallel` feature is enabled), sums the per-pair polynomials, and reduces all pairwise and tablewise entries by folding with the verifier challenge after each round.

## Examples

Expand All @@ -103,37 +117,66 @@ Here, `batched_constraint_poly` merges dense evaluation vectors (out-of-domain s

### 2) WARP - Twin Constraint Batching

[WARP](https://github.com/compsec-epfl/warp) also uses `coefficient_sumcheck` with `folding::protogalaxy::fold` to batch a codeword check and an R1CS constraint check into a single sumcheck. The codewords, witness vectors, and folding coefficients are stored as tablewise tables and the equality polynomial evaluations as a pairwise vector:
[WARP](https://github.com/compsec-epfl/warp) also uses `coefficient_sumcheck` with `folding::protogalaxy::fold` to batch a codeword check and an R1CS constraint check into a single sumcheck. The user implements `RoundPolyEvaluator` to define the per-pair math; the library handles iteration, parallelism, and reductions:

```rust
use efficient_sumcheck::coefficient_sumcheck::coefficient_sumcheck;
use efficient_sumcheck::coefficient_sumcheck::{coefficient_sumcheck, RoundPolyEvaluator};
use efficient_sumcheck::folding::protogalaxy;

struct TwinConstraintEvaluator { r1cs: ..., omega: F, degree: usize }

impl RoundPolyEvaluator<F> for TwinConstraintEvaluator {
fn degree(&self) -> usize { self.degree }
fn accumulate_pair(&self, coeffs: &mut [F], tw: &[(&[F], &[F])], pw: &[(F, F)]) {
let f = protogalaxy::fold(/* alpha pairs */, /* codeword polys */);
let p = protogalaxy::fold(/* beta pairs */, /* constraint polys */);
let t = [pw[0].0, pw[0].1 - pw[0].0]; // linear tau polynomial
// h(X) = (f(X) + ω·p(X)) · t(X) — accumulated directly into coeffs
// ... using poly_ops::add_scaled and poly_ops::mul_add_into
}
}

let mut tablewise = [codewords, z_vecs, alpha_vecs, beta_vecs];
let mut pairwise = [tau_eq_evals];

let sc = coefficient_sumcheck(
|tw, pw| {
let (u, z, a, b) = (&tw[0], &tw[1], &tw[2], &tw[3]);
let tau = &pw[0];

let f = protogalaxy::fold(/* ... */, /* codeword polys */);
let p = protogalaxy::fold(/* ... */, /* constraint polys */);
let t = linear_poly(tau[0], tau[1]);

// h(X) = (f(X) + ω·p(X)) · t(X)
(f + p * omega).naive_mul(&t)
},
&TwinConstraintEvaluator { r1cs, omega, degree },
&mut tablewise,
&mut pairwise,
log_l,
&mut prover_state,
);
let gamma = sc.verifier_messages;
```

After each round `coefficient_sumcheck` reduces all four tablewise tables and the pairwise equality evaluations by folding with the verifier's challenge.

## SIMD Acceleration

All three sumcheck variants auto-dispatch to SIMD-accelerated backends for Goldilocks (p = 2^64 − 2^32 + 1):

- **aarch64 (NEON)**: 2-wide vectorized add/sub, scalar multiply fallback
- **x86_64 (AVX-512 IFMA)**: 8-wide vectorized add/sub/mul via 52-bit fused multiply-accumulate

The dispatch is transparent — no code changes needed. LLVM constant-folds the field detection at compile time, so the non-SIMD path has zero overhead.

## Zero-Allocation Polynomial Arithmetic (`poly_ops`)

The `poly_ops` module provides slice-based polynomial arithmetic with no heap allocation:

```rust
use efficient_sumcheck::poly_ops;

let a = [F::from(1u64), F::from(2u64)]; // 1 + 2x
let b = [F::from(3u64), F::from(4u64)]; // 3 + 4x
let mut out = [F::ZERO; 3];

poly_ops::mul_into(&mut out, &a, &b); // out = a * b
poly_ops::add_scaled(&mut out, s, &c); // out += s * c
let val = poly_ops::eval_at(&out, challenge); // Horner evaluation
```

These are designed for hot loops where `DensePolynomial` allocation overhead dominates — protogalaxy folding, R1CS constraint evaluation, etc. The `protogalaxy::fold` function uses them internally, achieving up to 93× speedup over the naive `DensePolynomial` approach.

## Advanced Usage

Supporting the high-level interfaces are raw implementations of sumcheck [[LFKN92](#references)] using three proving algorithms:
Expand Down
Loading
Loading