[CK_BUILDER] Refactor warp GEMM and conv algorithm concepts #3567

vpietila-amd · 2026-01-14T15:03:04Z

Proposed changes

This PR has four main changes

Unified XDL and WMMA descriptions for warp under a single WarpGemmDescriptor concept and corresponding types.
Refactored the convolution algorithm concepts for conv dispatcher such that they have a well defined hierarchy. The convolution algorithms are grouped into three categories: XDL, WMMA, and DL. The XDL and WMMA algorithm have a common base algorithm concept ConvAlgorithm from which the hierarchy of XDL and WMMA algorithms is derived.
The convolution algorithm specialization are not bit flags such that one can define multiple specilization at the time. This allows us to achieve the hierarchy of the conv algorithms.
The input/output tile related thread clusters have now more descriptive names.

The unified warp GEMM description (item 1.) allows us to treat XDL and WMMA algorithms at unique footing and reduces the amount of boilerplate code.

The hierarchical description of the conv algorithms (item 2.) simplifies the convolution algorithm concepts and allows better compile-time error messages when no corresponding factory is found for a given algorithms description.

Items 1. and 2. combined allow easier addition of new factories.

…eight-factories

…ing representations.

…t factory.

shumway · 2026-01-16T05:33:28Z

experimental/builder/include/ck_tile/builder/factory/conv_algorithms.hpp

-concept SpecifiesTileTransferParameters3D = TileTransferParameters<T, 3>;
+// Base algorithm concepts
+template <typename T, size_t ThreadClusterRank = 3>
+concept ConvAlgorithm = ConvAlgorithmDescriptor<T> && SpecifiesThreadBlock<T> &&


Why are we requiring that a convolution always have to specify all this information.? This is a very strong design decision. We had talked about options for heuristics and default values. This appears to be blocking off those options.

I realize that's what the currrent code is, but I was really trying to move us away from concepts here. The if statements below were a bridge to more explicit logic for determining the implementation.

It looks like we've gone all in on concepts. I think this is a mistake, but we can ride it out and see where we get to. I think this is putting a huge burden on the calling code and may make MIOpen integration much more difficult. I had hoped to simplify the requirements on the calling code, but this looks pretty rigid and will probably require more helper code.

shumway · 2026-01-16T05:45:00Z

experimental/builder/include/ck_tile/builder/factory/README.md

+
+XDL and WMMA algorithms share a common base, while DL algorithms have their own independent base.
+
+## Common Base Hierarchy (XDL & WMMA)


This is going to be very complicated to explain to new CK library users. Without default values they will have know details of the convolution algorithms and the hardware they are compiling for. Rather than allowing for optional expert mode tuning, even beginning library users will need to know all the values, and if they get anything wrong they will get compiler errors. We're loosing a lot of the benefits of an abstract builder layer with this rigid design.

shumway · 2026-01-16T05:47:50Z

experimental/builder/include/ck_tile/builder/factory/README.md

+
+## Overview
+
+The convolution algorithms are organized into three main categories:


XDL and WMMA are specific to hardware architecture, not really a user choice. How does that fit into this design.

What is the logic of DL algorithms. Are those only for older CDNA2, or will they also work on newer architectures?

What about reference implementations? They don't fit in any of these three categories, but they are also convolutions create by the builder, right?

shumway · 2026-01-16T05:51:17Z

experimental/builder/include/ck_tile/builder/types.hpp

    INTERWAVE
 };

 enum class ConvAlgorithmSpecialization


Please document why these values of 0, 1, 2, 4, 8, 16 are chosen. I'm guessing it's some kind of a bitfield scheme. Are we planning on doing math on enum class values?

shumway · 2026-01-16T05:52:17Z

experimental/builder/include/ck_tile/builder/types.hpp

+    return static_cast<T>(spec) == 0;
+}
+
+enum class MatrixInstructionType


Shouldn't we start considering hardware architecture for some of this?

shumway · 2026-01-16T05:57:24Z

experimental/builder/test/conv/ck/test_ckb_conv_fwd_1d_i8.cpp

                                             .weight                 = {.config = {.layout = GKXC}},
                                             .output = {.config = {.layout = GNWK}}};

    constexpr auto FwdConvAlgorithm =


Is this kind of code with the chained methods going to part of the public API. This is very different from the reflection interface. I thought this was some kind of testing detail, but it looks like it's getting more and more baked into the design. How complex is this going to get when we support twice as many GPU architectures? Constants like GemmParams_Wmma_16_16_2x2_per_wave were not part of the original design and look like they are masking an increasingly complex builder interface.

shumway · 2026-01-16T06:01:12Z

experimental/builder/test/impl/conv_algorithm_types.hpp

+                          WarpGemm_,
+                          InputOutputTileTransfer_<4>,
+                          ConvSpecializationBwdWeight_,
+                          GemmPipeline_, // Not needed, but we need this to meet the ConvAlgorithm


This is a smaller example of complication I'm worried is coming from over-use of concepts. It looks to me like we've dumped a lot of logic into concept pattern matching. I think it's going to get more and more complicated and limit what we can do in build-time logic.

This was a bit of a code smell. I fixed the hierarchy of the algorithms concepts such that this is not needed.

shumway · 2026-01-16T06:07:31Z

I like some of the refactoring around simplifying XDL and WMMA. As I've been saying, I disagree with this kind of use of concepts and I think it's going to create a lot of problems. If other people agree with going forward this way, I'm fine with seeing where it goes. I'm anticipating more complications with MIOpen integration, and we may have to rework this all to get to a more user-friendly builder. I'm also concerned that we are not introducing hardware architecture yet. It seems like knowing which GPU we are compiling for is a key part of the kernel selection logic. I don't have a good feel for how this is going to work when were integrated with MIOpen for all supported hardware, and I keep thinking we are complicating things by overusing concepts.

…com:ROCm/composable_kernel into vpietila/ckb-refactor-warp-gemm-descriptors

Ville Pietilä added 30 commits December 18, 2025 04:36

Add placeholder test.

f7955d9

Merge remote-tracking branch 'origin/develop' into vpietila/ckb-bwd-w…

b828d35

…eight-factories

Initial conv bwd weight factory.

2460cf4

Conv builder test refactoring.

5a1c9c9

Add missing pieces to bwd weight factory.

1df8077

Improve compile time erros message when no matching factory is found.

4d5b5b7

Use amcro to ensure automatic macthing between concepts are their str…

4d20cc6

…ing representations.

Improve compile time diagnostics.

c6798d3

Small improvements.

8d40e6d

Improve missing member/wrong type compile-time errors.

9679d9b

Improve compile time diagnostics.

5ee99d8

Concept bug fixes.

dacf82d

Remove debug assert.

8eb6224

Update algorithm signature diagnostics.

a8e7edd

Factory bug fixes.

96a4a5d

First functional version of bwd weight conv factory.

608266a

Refactor handing of GEMM-K batch template parameter in conv bwd weigh…

a1740c6

…t factory.

Concept improvements.

77e10c7

Improve concept diagnostics.

ff2fdd8

Introduve a common size type for concepts.

8c80e00

Update compiletime diagnostics to use the size type.

30a9686

Update conv specialization enum.

027d943

Fix fwd conv builder tests.

3bd0f05

Fix smoke tests.

52086b3

Separate bwd weigth and bwd data tests into separate targets.

9926d94

Clean-up CK Tile builder tests.

277981b

Add bwd weight XDL CShuffle V3 factory.

80f4482

Build conv bwd weigth v3 instances successfully.

a83790e

Add instance traits for DeviceGroupedConvBwdWeight_Xdl_CShuffleV3.

ab88cee

Test fix.

3e16fa0

vpietila-amd requested review from a team, Snektron, ThomasNing, afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, cgmillette, coderfeli, ddembeckAMD, geyyer, illsilin, poyenc, qianfengz, shumway, tenpercent and vidyasagar-amd as code owners January 15, 2026 15:06

shumway reviewed Jan 16, 2026

View reviewed changes

vpietila-amd marked this pull request as draft January 16, 2026 10:02

Ville Pietilä added 2 commits January 16, 2026 05:32

Fix GEMM pipeline concept usage.

e5be038

Merge branch 'vpietila/ckb-refactor-warp-gemm-descriptors' of github.…

003c0b3

…com:ROCm/composable_kernel into vpietila/ckb-refactor-warp-gemm-descriptors

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CK_BUILDER] Refactor warp GEMM and conv algorithm concepts #3567

[CK_BUILDER] Refactor warp GEMM and conv algorithm concepts #3567

vpietila-amd commented Jan 14, 2026 •

edited

Loading

Uh oh!

shumway Jan 16, 2026

Uh oh!

shumway Jan 16, 2026

Uh oh!

shumway Jan 16, 2026

Uh oh!

shumway Jan 16, 2026

Uh oh!

shumway Jan 16, 2026

Uh oh!

shumway Jan 16, 2026

Uh oh!

shumway Jan 16, 2026

Uh oh!

shumway Jan 16, 2026

Uh oh!

shumway Jan 16, 2026

Uh oh!

vpietila-amd Jan 16, 2026

Uh oh!

shumway commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		XDL and WMMA algorithms share a common base, while DL algorithms have their own independent base.

		## Common Base Hierarchy (XDL & WMMA)


		## Overview

		The convolution algorithms are organized into three main categories:

[CK_BUILDER] Refactor warp GEMM and conv algorithm concepts #3567

Are you sure you want to change the base?

[CK_BUILDER] Refactor warp GEMM and conv algorithm concepts #3567

Conversation

vpietila-amd commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shumway commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vpietila-amd commented Jan 14, 2026 •

edited

Loading