Releases · ashvardanian/less_slow.cpp

20 Jan 09:58

v0.3.0

431d5f8

This release introduces benchmarks for gather & scatter SIMD rarely-used instructions that can be used to accelerate lookups by ~30% on current x86 and Arm machines.

Serial
AVX-512 for x86
SVE for Arm

Minor

Add: SVE gather/scatter (107b359)
Add: Serial & AVX-512 scatter/gather (089cfa0)

Patch

Improve: Timing SVE (daa55f5)
Improve: Stabilize gather timings (3fca991)

Assets 2

20 Jan 09:57

ashvardanian

v0.2.0

2d97782

v0.2: Pushing FLOPS in Assembly 🏋️‍♂️

Release: v0.2.0 [skip ci]

Minor

Add: Latency Hiding & Port Interleaving (086f8d7)
Add: AMX kernels (0cb024d)
Add: Inline Assembly kernels (89095a6)
Add: BLAS & Eigen TOPs benchmarks (28ca39b)
Add: AVX2 & low-precision AVX-512 TOPS (0a48108)
Add: i8, f16, and bf16 kernels (3f54200)
Add: Arm NEON FMAs (d0e521e)
Add: vfmadd231ps kernels (7ca3161)
Add: Assembly micro-kernels (2e71e76)

Patch

Docs: Zen4 matmul-benchmarks (2476310)
Docs: H100 Tensor Cores vs Intel (fa86663)
Fix: Illegal instruction for AMX (a7243dd)
Fix: Duplicate .global symbols (c732234)
Docs: Recommended Eigen macros (7be2d58)
Fix: Missing tops_u8_neon (d97bbfc)
Fix: Missing tops_f64_neon (4afa7e3)
Improve: Shorter TOPS names (be0c94b)

Assets 2

17 Jan 20:08

ashvardanian

v0.1.1

a4f7cda

Release v0.1.1

Release: v0.1.1 [skip ci]

Patch

Docs: Renaming small_string benchmark (1ad50d1)
Improve: Validate sorting result (77808a6)
Improve: Log bytes/sec for trigonometry (ebc67d6)
Docs: Placeholders (84e7710)
Improve: BENCHMARK_CAPTURE w/out template (d8fb261)

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Minor

Patch

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Minor

Patch

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Patch

Uh oh!

Releases: ashvardanian/less_slow.cpp

v0.3: Gather 🔄 Scatter

Minor

Patch

Uh oh!

v0.2: Pushing FLOPS in Assembly 🏋️‍♂️

Minor

Patch

Uh oh!

Release v0.1.1

Patch

Uh oh!