[TRITON] Add attention sink support to Triton MHA kernels #1576

brunomazzottiamd · 2025-12-05T13:57:08Z

Motivation

In gpt-oss attention implementation, each attention head has a learned bias in the denominator of the softmax. This is similar to attention sink and we can enable gpt-oss by adding attention sink support to our AITER MHA kernels (both forward and backward kernels). The target model is gpt-oss20b.

Technical Details

gpt-oss has one sink parameter per query head, so sink tensor and its gradient must be 1D (source).
Attention sink feature lacks support for fp8 data types. The proposed changes were tested with bf16 and fp32 data types, but they should also work with fp16 data type.
Triton MHA backward has two implementations: "fused" and "one kernel". Sink support was added only to "one kernel" implementation since it's the default one and it provides the best performance most of the time.
Changes in forward kernel:
- Initializes the running maximum with the respective sink value instead of -inf.
Changes in backward kernel:
- Gradient of sink is a sum reduction. It should be computed and accumulated per query head.
- Sink gradient computation was added to the outer loop that computes Q gradient. Partial accumulation is done once per query head and sequence block.
- The sum reduction was implemented with atomic add because the kernel is parallelized across (KV heads, sequence length, batch). So multiple Triton programs would be computing partial sums of the same sink gradient element, given that we have one per query head.
- Sink gradient type is fp32 to enable atomics ops.

Test Plan

Added 2 new unit tests for the sink case to op_tests/triton_tests/test_mha.py: test_mha_with_sink and test_mha_varlen_with_pe. They cover 192 new cases and test both forward and backward passes.
BSHD layout + Causal + Dropout case isn't supported in backward with sink because this case isn't supported in regular backward. THD layout + Dropout case isn't supported in backward with sink because this case isn't supported in regular backward. I think it's best to fix the base backward implementation before adding sink to the mix.

Test Result

All sink tests from op_tests/triton_tests/test_mha.py are passing on gfx942 and gfx950.
All other tests from op_tests/triton_tests/test_mha.py produce the same results as before the sink was added. This happens on both gfx942 and gfx950. So we can conclude that the newly added sink feature didn't break anything that was already working.

Performance Assessment

Target attention shapes:

Data type: bf16
TP1: HQ = 64, HKV = 8, D = 64, SQ = SK = 8192.
TP8: HQ = 8, HKV = 1, D = 64, SQ = SK = 8192.
Batch sizes 8-16 for thd layout and 1 for bshd layout.

Forward performance in gfx950:

TP	Batch size	Layout	Forward time without sink (ms)	Forward time with sink (ms)	Speedup
1	1	bshd	0.906	0.910	1.00
1	8	thd	7.169	7.115	1.01
1	9	thd	8.060	7.955	1.01
1	10	thd	8.856	8.827	1.00
1	11	thd	9.778	9.802	1.00
1	12	thd	10.694	10.736	1.00
1	13	thd	11.546	11.487	1.01
1	14	thd	12.414	12.362	1.00
1	15	thd	13.347	13.280	1.01
1	16	thd	14.193	14.188	1.00
8	1	bshd	0.191	0.190	1.00
8	8	thd	0.965	0.963	1.00
8	9	thd	1.079	1.081	1.00
8	10	thd	1.185	1.187	1.00
8	11	thd	1.299	1.317	0.99
8	12	thd	1.409	1.422	0.99
8	13	thd	1.528	1.524	1.00
8	14	thd	1.644	1.640	1.00
8	15	thd	1.760	1.757	1.00
8	16	thd	1.864	1.874	0.99
				Geomean	1.00

Backward performance in gfx950:

TP	Batch size	Layout	"One kernel" backward time without sink (ms)	"One kernel" backward time with sink (ms)	Speedup
1	1	bshd	6.284	6.342	0.99
1	8	thd	52.947	52.272	1.01
1	9	thd	59.567	58.790	1.01
1	10	thd	65.465	65.748	1.00
1	11	thd	72.319	72.383	1.00
1	12	thd	79.073	78.599	1.01
1	13	thd	85.665	84.760	1.01
1	14	thd	91.813	91.301	1.01
1	15	thd	98.538	98.074	1.00
1	16	thd	105.008	104.472	1.01
8	1	bshd	4.423	4.336	1.02
8	8	thd	9.548	9.479	1.01
8	9	thd	11.594	11.661	0.99
8	10	thd	12.819	13.071	0.98
8	11	thd	13.014	13.088	0.99
8	12	thd	13.958	14.067	0.99
8	13	thd	15.211	15.439	0.99
8	14	thd	16.498	16.555	1.00
8	15	thd	17.525	17.593	1.00
8	16	thd	17.827	17.768	1.00
				Geomean	1.00

Conclusion: Attention sink feature doesn't change performance on gfx950.

I did the same analysis on gfx942 and got the same conclusion. I'm not publishing the numbers in the PR for the sake of brevity.

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

brunomazzottiamd · 2025-12-08T12:40:27Z

Rebased on top of main on December 8th.

op_tests/triton_tests/test_mha.py

cagrikymk · 2025-12-08T16:18:44Z

@brunomazzottiamd I added some comments specifically about the threshold for the tests.
Besides that, LGTM!

brunomazzottiamd · 2025-12-08T16:51:42Z

@brunomazzottiamd I added some comments specifically about the threshold for the tests. Besides that, LGTM!

Thank you so much for your time reviewing this PR. I'll answer you properly.

brunomazzottiamd · 2025-12-09T18:31:20Z

@cagrikymk, please let me know if I have answered your questions properly. Feel free to suggest other changes or improvements. Thanks to your review I could decrease dsink error tolerance.

FYI: Agreed deadline for merging this PR is December 15th.

cagrikymk · 2025-12-09T18:55:18Z

@brunomazzottiamd LGTM!

brunomazzottiamd · 2025-12-09T20:36:02Z

Rebased on top of main on December 9th.

brunomazzottiamd self-assigned this Dec 5, 2025

brunomazzottiamd added enhancement New feature or request triton labels Dec 5, 2025

brunomazzottiamd marked this pull request as ready for review December 5, 2025 15:48

brunomazzottiamd requested review from azaidy, cagrikymk and lucas-santos-amd December 5, 2025 15:48

brunomazzottiamd force-pushed the bmazzott/attn_sink branch from f677369 to 76bfe95 Compare December 8, 2025 12:39

cagrikymk previously approved these changes Dec 8, 2025

View reviewed changes

op_tests/triton_tests/test_mha.py Outdated Show resolved Hide resolved

op_tests/triton_tests/test_mha.py Show resolved Hide resolved

brunomazzottiamd dismissed cagrikymk’s stale review via f7d21ab December 8, 2025 17:53

brunomazzottiamd requested a review from cagrikymk December 8, 2025 18:52

brunomazzottiamd force-pushed the bmazzott/attn_sink branch 2 times, most recently from b082964 to 6eccfc1 Compare December 9, 2025 17:43

cagrikymk approved these changes Dec 9, 2025

View reviewed changes

brunomazzottiamd added 6 commits December 9, 2025 17:34

Add attention sink support to forward pass

9188b47

Add attention sink forward pass support to benchmark script

31d1087

Add attention sink support to backward pass

1dbc93f

Add attention sink backward pass support to benchmark script

954f9d5

Conditionally relax dv error toletance on gfx942

0a97262

Decrease error tolerance for dsink

e898512

brunomazzottiamd force-pushed the bmazzott/attn_sink branch from 6eccfc1 to e898512 Compare December 9, 2025 20:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TRITON] Add attention sink support to Triton MHA kernels #1576

[TRITON] Add attention sink support to Triton MHA kernels #1576

Uh oh!

brunomazzottiamd commented Dec 5, 2025 •

edited

Loading

Uh oh!

brunomazzottiamd commented Dec 8, 2025

Uh oh!

Uh oh!

Uh oh!

cagrikymk commented Dec 8, 2025

Uh oh!

brunomazzottiamd commented Dec 8, 2025

Uh oh!

brunomazzottiamd commented Dec 9, 2025 •

edited

Loading

Uh oh!

cagrikymk commented Dec 9, 2025

Uh oh!

brunomazzottiamd commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[TRITON] Add attention sink support to Triton MHA kernels #1576

Are you sure you want to change the base?

[TRITON] Add attention sink support to Triton MHA kernels #1576

Uh oh!

Conversation

brunomazzottiamd commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Performance Assessment

Submission Checklist

Uh oh!

brunomazzottiamd commented Dec 8, 2025

Uh oh!

Uh oh!

Uh oh!

cagrikymk commented Dec 8, 2025

Uh oh!

brunomazzottiamd commented Dec 8, 2025

Uh oh!

brunomazzottiamd commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cagrikymk commented Dec 9, 2025

Uh oh!

brunomazzottiamd commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

brunomazzottiamd commented Dec 5, 2025 •

edited

Loading

brunomazzottiamd commented Dec 9, 2025 •

edited

Loading