fix: dot/schedule_bench A_stride_m for transpose_a kernels by kasper0406 · Pull Request #10004 · google/XNNPACK

kasper0406 · 2026-04-18T21:08:40Z

Summary

schedule_bench was passing a_stride_m (the {i, tile_k} intra-row stride) as the 6th positional arg to kernels that set dot_flag::transpose_a, but for those kernels the 6th arg is consumed as the stride along the k1 dimension of the packed tensor (the advance per inner-k step)
subgraph/dot.cc::call_kernel already does the correct swap (transposed_a ? a_k_strides[0] : a_stride_m). This change mirrors it in schedule_bench so the bench exercises the same work the production path executes
The built-in correctness check (A = B = 1, assert c == k everywhere) can't catch this class of bug because the wrong-stride loads still return 1s. Benchmark GFLOPS numbers reported against this bench were inflated by artificial cache hits on overlapping reads

Test plan

bazel test //ynnpack/kernels/dot/... passes (including schedule_bench_test, schedule_test, consistent_arithmetic_test, test)

google-cla · 2026-04-18T21:08:57Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Correctly supply the A stride for transpose_a kernels

dsharlet

Thanks for the fix!

kasper0406 mentioned this pull request Apr 19, 2026

sme dot: size kc for L2 stripe reuse, not L1 per-call stripe #10005

Open

kasper0406 marked this pull request as ready for review April 19, 2026 13:16

kasper0406 marked this pull request as draft April 19, 2026 13:16

dot/schedule_bench: fix A_stride_m for transpose_a kernels

d49a1b5

Correctly supply the A stride for transpose_a kernels

kasper0406 force-pushed the kn/dot-bench-fix branch from ad4dea5 to d49a1b5 Compare April 19, 2026 13:21

kasper0406 marked this pull request as ready for review April 19, 2026 13:31

dsharlet approved these changes Apr 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: dot/schedule_bench A_stride_m for transpose_a kernels#10004

fix: dot/schedule_bench A_stride_m for transpose_a kernels#10004
kasper0406 wants to merge 1 commit intogoogle:masterfrom
kasper0406:kn/dot-bench-fix

kasper0406 commented Apr 18, 2026

Uh oh!

google-cla Bot commented Apr 18, 2026

Uh oh!

dsharlet left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kasper0406 commented Apr 18, 2026

Summary

Test plan

Uh oh!

google-cla Bot commented Apr 18, 2026

Uh oh!

dsharlet left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants