[CK_TILE][FMHA] Add new tile size for async #3586

LJ-underdog · 2026-01-16T03:31:54Z

Proposed changes

This PR adds a new tile size configuration for async pipeline operations in the FMHA (Fused Multi-Head Attention) forward pass implementation. The changes introduce support for a 64x128 tile configuration for specific head dimension scenarios and adjust the sequence tuning logic to accommodate this new tile size.
Changes:

Modified sequence tuning logic to include tile size 64 as a special case alongside the maximum tile size
Added filtering logic to exclude 64-size tiles for non-async pipelines with 128x128 head dimensions
Introduced a new 64x128x32 tile size configuration with compute unit constraint for 128x128 head dimensions

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

I have added tests relevant to the introduced functionality, and the unit tests are passing locally
I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
I have added inline documentation which enables the maintainers with understanding the motivation
I have removed the stale documentation which is no longer relevant after this pull request
(If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
I have run clang-format on all changed files
Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

Signed-off-by: Linjun-AMD <[email protected]>

LJ-underdog · 2026-01-16T03:33:41Z

Performance` improve 30% on gfx950 when block_num <= cu_num

`./bin/tile_example_fmha_fwd -h=8 -d=128 -s=512 -kname=1 -v=1
[fp16|batch|bhsd] b:2, h:8/8, s:512/512, d:128/128, scale_s:0.0883883, bias:n, p_drop:0, lse:0, qscale:n, mask:n, v:r, fmha_fwd_d128_fp16_batch_b128x64x32x128x16x128_r4x1x1_r4x1x1_w32x32x16_w32x32x16_qr_async_trload_vr_npad_nlogits_nbias_nmask_nlse_ndropout_nskip_nqscale_trload_nsink, 0.020 ms, 109.11 TFlops, 426.21 GB/s, valid:y

./bin/tile_example_fmha_fwd -h=8 -d=128 -s=512 -kname=1 -v=1
[fp16|batch|bhsd] b:2, h:8/8, s:512/512, d:128/128, scale_s:0.0883883, bias:n, p_drop:0, lse:0, qscale:n, mask:n, v:r, fmha_fwd_d128_fp16_batch_b64x128x32x128x32x128_r4x1x1_r4x1x1_w16x16x32_w16x16x16_qr_async_vr_psddv_nlogits_nbias_nmask_nlse_ndropout_nskip_nqscale_ntrload_nsink, 0.015 ms, 143.74 TFlops, 561.49 GB/s, valid:y`

Copilot

Pull request overview

This PR adds a new tile size configuration for async pipeline operations in the FMHA (Fused Multi-Head Attention) forward pass implementation. The changes introduce support for a 64x128 tile configuration for specific head dimension scenarios and adjust the sequence tuning logic to accommodate this new tile size.

Changes:

Modified sequence tuning logic to include tile size 64 as a special case alongside the maximum tile size
Added filtering logic to exclude 64-size tiles for non-async pipelines with 128x128 head dimensions
Introduced a new 64x128x32 tile size configuration with compute unit constraint for 128x128 head dimensions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

example/ck_tile/01_fmha/codegen/ops/fmha_fwd.py

Co-authored-by: Copilot <[email protected]>

add new tile size for async

d050e38

Signed-off-by: Linjun-AMD <[email protected]>

LJ-underdog requested review from Snektron, ThomasNing, afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, cgmillette, coderfeli, geyyer, illsilin, poyenc, qianfengz, shumway, tenpercent, vidyasagar-amd and vpietila-amd as code owners January 16, 2026 03:31

LJ-underdog requested a review from Copilot January 16, 2026 03:34

Copilot AI reviewed Jan 16, 2026

View reviewed changes

example/ck_tile/01_fmha/codegen/ops/fmha_fwd.py Outdated Show resolved Hide resolved

Update example/ck_tile/01_fmha/codegen/ops/fmha_fwd.py

4b6dd36

Co-authored-by: Copilot <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CK_TILE][FMHA] Add new tile size for async #3586

[CK_TILE][FMHA] Add new tile size for async #3586

LJ-underdog commented Jan 16, 2026 •

edited

Loading

Uh oh!

LJ-underdog commented Jan 16, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[CK_TILE][FMHA] Add new tile size for async #3586

Are you sure you want to change the base?

[CK_TILE][FMHA] Add new tile size for async #3586

Conversation

LJ-underdog commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Checklist

Discussion

Uh oh!

LJ-underdog commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LJ-underdog commented Jan 16, 2026 •

edited

Loading

LJ-underdog commented Jan 16, 2026 •

edited

Loading