Skip to content

perf: transpose recoder coefficient matrix for cache locality#35

Open
MavenRain wants to merge 1 commit intoitzmeanjan:mainfrom
MavenRain:perf/transpose-recoder-matrix
Open

perf: transpose recoder coefficient matrix for cache locality#35
MavenRain wants to merge 1 commit intoitzmeanjan:mainfrom
MavenRain:perf/transpose-recoder-matrix

Conversation

@MavenRain
Copy link
Copy Markdown

(Closes #15)

Store the coding coefficient matrix in column-major order so that each column is contiguous in memory. The inner dot product loop inrecode_with_buf now scans sequentially via zip instead of striding across rows, improving L1/L2 cache utilization.

…itzmeanjan#15)

  Store the coding coefficient matrix in column-major order so that
  each column is contiguous in memory. The inner dot-product loop in
  recode_with_buf now scans sequentially via zip instead of striding
  across rows, improving L1/L2 cache utilization.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Attempt to optimize RLNC Recoder Matrix Multiplication

1 participant