Add bfloat16 tuned fused moe config for Dpsk-MTP layer on B200 #13455

Fridge003 · 2025-11-17T20:56:38Z

Motivation

When launching dpsk-r1-fp4 with MTP and TP4, the draft model will use bfloat16 fused moe triton kernels.
So it requires some tuning.

Modifications

Accuracy Tests

Benchmarking and Profiling

# Launch
export SGLANG_ENABLE_SPEC_V2=1
python3 -m sglang.launch_server \
    --model-path nvidia/DeepSeek-R1-0528-FP4-v2 \
    --trust-remote-code \
    --attention-backend trtllm_mla \
    --moe-runner-backend flashinfer_trtllm \
    --quantization modelopt_fp4 \
    --tp 4 \
    --speculative-algorithm EAGLE \
    --kv-cache-dtype fp8_e4m3

# Profile
python3 -m sglang.bench_one_batch_server --model nvidia/DeepSeek-R1-0528-FP4-v2 --base-url http://localhost:30000 --batch-size 16 --input-len 1024 --output-len 20 --skip-warmup --profile --profile-steps 10

Main:

This PR:

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Work with maintainers to merge your PR. See the PR Merge Process

…roject#13455)

Add bfloat16 tuned fused moe config for B200

d01068b

sglang-bot added the run-ci label Nov 17, 2025

Fridge003 removed the run-ci label Nov 17, 2025

Fridge003 marked this pull request as ready for review November 18, 2025 00:07

Fridge003 requested review from BBuf, Edwardf0t1, HaiShaw, Ying1123, ch-wan, ispobock, kushanam and merrymercy as code owners November 18, 2025 00:07

Fridge003 changed the title ~~Add bfloat16 tuned fused moe config for B200~~ Add bfloat16 tuned fused moe config for Dpsk-MTP layer on B200 Nov 18, 2025

Fridge003 added the run-ci label Nov 18, 2025

Fridge003 merged commit 85ae508 into main Nov 18, 2025
87 of 120 checks passed

Fridge003 deleted the baizhou/fused-moe branch November 18, 2025 01:44

00INDEX pushed a commit to 00INDEX/sglang that referenced this pull request Nov 18, 2025

Add bfloat16 tuned fused moe config for Dpsk-MTP layer on B200 (sgl-p…

f5d3e35

…roject#13455)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add bfloat16 tuned fused moe config for Dpsk-MTP layer on B200 #13455

Add bfloat16 tuned fused moe config for Dpsk-MTP layer on B200 #13455

Uh oh!

Fridge003 commented Nov 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add bfloat16 tuned fused moe config for Dpsk-MTP layer on B200 #13455

Add bfloat16 tuned fused moe config for Dpsk-MTP layer on B200 #13455

Uh oh!

Conversation

Fridge003 commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fridge003 commented Nov 17, 2025 •

edited

Loading