Skip to content

[ATOM SGL] update fp8 prefill argument passing#1211

Merged
zhuyuhua-v merged 3 commits into
ROCm:mainfrom
ZhiweiYan-96:zhiwei/fp8_prefill_aiter_update
Jun 16, 2026
Merged

[ATOM SGL] update fp8 prefill argument passing#1211
zhuyuhua-v merged 3 commits into
ROCm:mainfrom
ZhiweiYan-96:zhiwei/fp8_prefill_aiter_update

Conversation

@ZhiweiYan-96

@ZhiweiYan-96 ZhiweiYan-96 commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Motivation

FP8 prefill fails using new atier due to API mismatch (exposed in https://github.com/ROCm/ATOM/actions/runs/27469822009/job/81199035902#logs). This PR fix the issue using the new API

Test Result

export AITER_QUICK_REDUCE_QUANTIZATION=INT4
export SGLANG_USE_AITER=1
export ATOM_ENABLE_DS_QKNORM_QUANT_FUSION=1
export SGLANG_EXTERNAL_MODEL_PACKAGE=atom.plugin.sglang.models
export SGLANG_ENABLE_TORCH_COMPILE=1
export SGLANG_AITER_FP8_PREFILL_ATTN=1
python3 -m sglang.launch_server \
  --model-path /workspace/shared/data/amd_int/models/deepseek-ai/DeepSeek-R1-0528-MXFP4-v2 \
  --host localhost --port 8000 \
  --trust-remote-code \
  --tensor-parallel-size 4 \
  --attention-backend aiter \
  --kv-cache-dtype fp8_e4m3 \
  --mem-fraction-static 0.85 \
  --page-size 1 \
  --disable-radix-cache

gsm8k results

image

Submission Checklist

@ZhiweiYan-96 ZhiweiYan-96 marked this pull request as ready for review June 15, 2026 03:33
Copilot AI review requested due to automatic review settings June 15, 2026 03:33

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates ATOM’s SGLang full-attention FP8 prefill path to pass an additional num_kv_splits argument into the MLA reduction kernel, deriving it from the reduce indptr metadata.

Changes:

  • Add _max_reduce_group_size(reduce_indptr) helper to compute the maximum per-output reduce group size.
  • Compute num_kv_splits from ForwardMetadata.reduce_indptr during FP8 prefill.
  • Pass num_kv_splits into mla_reduce_v1(...) to match the updated FP8 prefill reduction calling convention.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings June 15, 2026 13:45

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

@zhuyuhua-v zhuyuhua-v merged commit 50187a0 into ROCm:main Jun 16, 2026
21 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants