MXFP4/MXFP8/int4 weights support in CuTe interface MoE GEMM example #640

sanchitintel · 2025-11-21T05:57:57Z

Summary

Adds MoE GEMM implementation for MXFP4/MXFP8 (FP4/FP8 weights & E8M0 scales, with group-wise quantization) with CuTe interface.
If users don't select copy atoms for loading activations, weights & storing output, then they would be chosen automatically. Users can pass void as corresponding copy atom template parameters, but the copy atoms chosen automatically may not not always attain the best performance, so users can specify custom copy atoms.

Support for int4 weights with BF16/FP16 scales has also been added.

Weights are in plain format, and have not been prepacked.

Details

BMG doesn't support MXFP4/MXFP8/int4 natively, so it's converted to either FP16 or BF16, depending upon the activation.

Currently, it assumes WG_K & SG_K are both equal to 32.

Performance

Largely depends upon scaledMM performance in #633

cc @CaoZhongZ @mayuyuace @pengzhao-intel

mayuyuace · 2025-12-01T02:01:10Z

Please note that when weight=int4, there is a default zero point which is 8.
But as I tested, reorder u4 to bf16/fp16 does not have the default zero point.
Does reorder offer a zero-points interface? Otherwise, the speed will be significantly slower.

mayuyuace · 2025-12-01T02:24:54Z

Another question is that mxfp4 scales data type is ue8m0, storage data type is uint8.
Did you add the casting when dequantize in kernel?

sanchitintel · 2025-12-01T18:48:37Z

Please note that when weight=int4, there is a default zero point which is 8

That'd depend upon the quantization-type.

Does reorder offer a zero-points interface?

No

Did you add the casting when dequantize in kernel?

I reinterpreted cast only because igc loads more data than necessary (discards the rest) when I used ue8m0, so I reinterpret casted to int8 for loads.

sanchitintel changed the title ~~Add MXFP4 MoE GEMM with CuTe interface~~ MXFP4 MoE GEMM example with CuTe interface Nov 21, 2025

sanchitintel force-pushed the mxfp4_moe_gemm branch from dd8400e to 9eb893b Compare November 30, 2025 07:56

More MoE GEMMs

f69ee6a

sanchitintel force-pushed the mxfp4_moe_gemm branch from 5bb5022 to f69ee6a Compare November 30, 2025 08:00

sanchitintel changed the title ~~MXFP4 MoE GEMM example with CuTe interface~~ MXFP4/MXFP8/int4 MoE GEMM example with CuTe interface Nov 30, 2025

Remove redundant comment

472cb18

sanchitintel marked this pull request as ready for review November 30, 2025 21:31

sanchitintel changed the title ~~MXFP4/MXFP8/int4 MoE GEMM example with CuTe interface~~ MXFP4/MXFP8/int4 weights support in CuTe interface MoE GEMM example Nov 30, 2025

Use if constexpr to update scales tensor

81faa13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MXFP4/MXFP8/int4 weights support in CuTe interface MoE GEMM example #640

MXFP4/MXFP8/int4 weights support in CuTe interface MoE GEMM example #640

sanchitintel commented Nov 21, 2025 •

edited

Loading

Uh oh!

mayuyuace commented Dec 1, 2025 •

edited

Loading

Uh oh!

mayuyuace commented Dec 1, 2025

Uh oh!

sanchitintel commented Dec 1, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MXFP4/MXFP8/int4 weights support in CuTe interface MoE GEMM example #640

Are you sure you want to change the base?

MXFP4/MXFP8/int4 weights support in CuTe interface MoE GEMM example #640

Conversation

sanchitintel commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Performance

Uh oh!

mayuyuace commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mayuyuace commented Dec 1, 2025

Uh oh!

sanchitintel commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sanchitintel commented Nov 21, 2025 •

edited

Loading

mayuyuace commented Dec 1, 2025 •

edited

Loading

sanchitintel commented Dec 1, 2025 •

edited

Loading