Skip to content

perf: coarse-grained parallel encoding with fused SIMD (closes #31)#36

Open
MavenRain wants to merge 1 commit intoitzmeanjan:mainfrom
MavenRain:perf/coarse-grained-parallel-encode
Open

perf: coarse-grained parallel encoding with fused SIMD (closes #31)#36
MavenRain wants to merge 1 commit intoitzmeanjan:mainfrom
MavenRain:perf/coarse-grained-parallel-encode

Conversation

@MavenRain
Copy link
Copy Markdown

Replace fine-grained rayon par_iter (one task per piece, per-piece allocation, two-pass multiply then add) with coarse-grained chunking across threads. Each thread accumulates its piece range using the fused multiply-and-add SIMD operation in a single memory pass with zero per-piece allocation. Reduces allocation from O(piece_count) to O(num_threads) and halves memory bandwidth per piece.

…anjan#31)

  Replace fine-grained rayon par_iter (one task per piece, per-piece
  allocation, two-pass multiply then add) with coarse-grained chunking
  across threads. Each thread accumulates its piece range using the
  fused multiply-and-add SIMD operation in a single memory pass with
  zero per-piece allocation. Reduces allocation from O(piece_count) to
  O(num_threads) and halves memory bandwidth per piece.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant