[CK tests] Extend conv GPU reference #3539

johannes-graner · 2026-01-09T10:33:35Z

Proposed changes

This PR extends GPU reference implementation support for convolution operations with elementwise fusion and output operations. The changes enable GPU-accelerated reference implementations for tests involving scale, bias, batchnorm, clamp, and bilinear operations across forward, backward data, and backward weight convolutions.

Key improvements:

Extended naive_conv_fwd_gpu, naive_conv_bwd_data_gpu, and naive_conv_bwd_weight_gpu to support elementwise operations, clamp, scale, and bilinear fusion
Significantly improved test execution times across 11 test suites

Performance impact:

Test Name	Before (s)	After (s)	Speedup
test_convnd_fwd	99.0	5.4	18.3x
test_convnd_bwd_data	41.0	13.0	3.2x
test_grouped_conv_bwd_data_scale	36.0	29.0	1.2x
test_grouped_convnd_fwd_clamp	352.0	211.0	1.7x
test_grouped_convnd_fwd_scale	108.0	36.0	3.0x
test_grouped_convnd_fwd_bias_clamp	291.0	235.0	1.2x
test_grouped_convnd_fwd_gk_bias_clamp	290.0	228.0	1.3x
test_grouped_convnd_fwd_bilinear	140.0	41.0	3.4x
test_grouped_convnd_fwd_scaleadd_ab	171.0	22.0	7.8x
test_grouped_conv_bwd_data_bilinear	4.9	3.3	1.5x
test_grouped_convnd_bwd_weight_bilinear	6.7	2.5	2.7x

These improvements reduce total execution time for these tests from 1540 seconds to 826 seconds, saving approximately 12 minutes.

Checklist

I have added tests relevant to the introduced functionality, and the unit tests are passing locally
I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
I have added inline documentation which enables the maintainers with understanding the motivation
I have removed the stale documentation which is no longer relevant after this pull request
(If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
I have run clang-format on all changed files
Any dependent changes have been merged

Discussion

The implementation focuses on extending the GPU reference path to match the functionality available in the CPU reference path.

Additional improvement is possible by using GPU for verification and tensor initialization. This PR is already large, so those improvements are deferred.

The batchnorm profiler and tests are not changed since the tests are flaky. In order to keep this PR focused, those changes are also deferred.

This reverts commit a3b2475.

This reverts commit 6da4576.

This reverts commit e2f75fa.

bartekxk · 2026-01-14T23:30:36Z

library/include/ck/library/reference_tensor_operation/gpu/naive_conv_bwd_data_gpu.hpp

-            const OutDataType* out_gn = p_out + g * out_stride_g + n * out_stride_n;
-            const WeiDataType* wei_g  = p_wei + g * wei_stride_g;
+            float acc                  = 0.0f;
+            const OutDataType* out_gn0 = p_outs[0] + g * out_stride_g + n * out_stride_n;


Can we change these names? out_gn0, wei_g0, wei_gkc0 etc they are not clear for me

Good point, I'll change them.

bartekxk · 2026-01-14T23:32:20Z

library/include/ck/library/reference_tensor_operation/gpu/naive_conv_bwd_data_gpu.hpp

+                                            const OutDataType* out_extra1 =
+                                                p_outs[2] + g * out_stride_g + n * out_stride_n +
+                                                ho * out_stride_h;
+                                            out_op(out_val,


MAybe we can create some common function which will call function with proper number of parameters

That would definitely make it nicer to look at. Thanks for the suggestion!

bartekxk · 2026-01-14T23:32:33Z

library/include/ck/library/reference_tensor_operation/gpu/naive_conv_bwd_data_gpu.hpp

+                                                p_outs[2] + g * out_stride_g + n * out_stride_n +
+                                                ho * out_stride_h;
+                                            out_op(out_val,
+                                                   out_gnkh0[k * out_stride_k + wo],


or use some unpack

bartekxk · 2026-01-14T23:41:09Z

library/include/ck/library/reference_tensor_operation/gpu/naive_conv_bwd_data_gpu.hpp

+
+    for(index_t i = 0; i <= NumAElementwise; ++i)
+    {
+        strided_copy_kernel<TOut, false>


why we need this?

In order to make the naive implementation as simple as possible, the actual conv kernels only operate on packed data. To support all the various layouts, we first have to transform the non-packed tensors into packed tensor, run the kernel, and then transform back to the correct layout.

The loop performs this transformation for all the tensors used in the convolution (for bwd_data, this is the out and weight tensors), which is more than one in the bilinear convolutions.

johannes-graner added 14 commits January 7, 2026 09:31

test_convnd_fwd

c7da77d

test_convnd_bwd_data

e00ef08

test_conv_bwd_data_scale

2f83bac

test_grouped_convnd_fwd_clamp

2f9b366

test_grouped_convnd_fwd_scale

0c106d2

multiple A/B tensors and D tensor for fwd GPU ref

9e95a2a

test_grouped_convnd_fwd_scaleadd_ab

7004943

test_grouped_convnd_fwd_bias_clamp

3298801

test_grouped_convnd_fwd_bilinear

2e36ef8

test_grouped_convnd_fwd_gk_bias_clamp

2992269

Extend GPU reference to enable batchnorm epilogue

e2f75fa

test_grouped_convnd_fwd{,_gk}_bias_bnorm_clamp

6da4576

test_grouped_conv_bwd_data_bilinear

64cf835

test_grouped_convnd_bwd_weight_bilinear

1556359

johannes-graner requested review from a team, ThomasNing, afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, cgmillette, coderfeli, geyyer, illsilin, poyenc, qianfengz, shumway and vidyasagar-amd as code owners January 9, 2026 10:33

johannes-graner requested a review from tenpercent as a code owner January 9, 2026 10:33

johannes-graner and others added 9 commits January 9, 2026 11:38

Merge branch 'develop' into jograner/extend-gpu-reference

9c2a899

Add missing template instantiation

0d5b27d

Perform operations in float in reference

85f2d93

Merge branch 'develop' into jograner/extend-gpu-reference

684c9ed

Merge branch 'develop' into jograner/extend-gpu-reference

c9f0a5c

Slightly increase tolerance for batchnorm profiler

a3b2475

Revert "Slightly increase tolerance for batchnorm profiler"

a7c1f22

This reverts commit a3b2475.

Revert "test_grouped_convnd_fwd{,_gk}_bias_bnorm_clamp"

7016a7f

This reverts commit 6da4576.

Revert "Extend GPU reference to enable batchnorm epilogue"

5d81d9d

This reverts commit e2f75fa.

bartekxk reviewed Jan 14, 2026

View reviewed changes

johannes-graner added 2 commits January 15, 2026 04:01

Clarify variable names

d057d28

Refactor elementwise ops into helper functions

16a9c61

johannes-graner requested review from Snektron and vpietila-amd as code owners January 15, 2026 10:43

johannes-graner and others added 2 commits January 15, 2026 11:44

Merge branch 'develop' into jograner/extend-gpu-reference

822dae0

Make helpers C++17-compatible

06b03ca

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CK tests] Extend conv GPU reference #3539

[CK tests] Extend conv GPU reference #3539

Uh oh!

johannes-graner commented Jan 9, 2026 •

edited

Loading

Uh oh!

bartekxk Jan 14, 2026

Uh oh!

johannes-graner Jan 15, 2026

Uh oh!

bartekxk Jan 14, 2026

Uh oh!

johannes-graner Jan 15, 2026

Uh oh!

bartekxk Jan 14, 2026

Uh oh!

bartekxk Jan 14, 2026

Uh oh!

johannes-graner Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[CK tests] Extend conv GPU reference #3539

Are you sure you want to change the base?

[CK tests] Extend conv GPU reference #3539

Uh oh!

Conversation

johannes-graner commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Checklist

Discussion

Uh oh!

bartekxk Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

johannes-graner Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

bartekxk Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

johannes-graner Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

bartekxk Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

bartekxk Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

johannes-graner Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

johannes-graner commented Jan 9, 2026 •

edited

Loading