feat(blog): MI355X DeepSeek-V4-Pro SGLang — 110.5x throughput per GPU in 26 days#388
Conversation
… in 26 days Time-series story of the sgl-project/sglang amd/deepseek_v4 side branch: 20.4 → 2,256 tok/s/GPU at iso/improving interactivity on 8K/1K, across 31 numbered performance optimization PRs. Includes per-date tables, iso-iv interpolation (12-14x in the 10-20 tok/s/user serving band from FP4 first light to 05-21), DSv4-Pro vs Claude/GPT/Gemini quality benchmarks, on-paper specs framing (MI355X vs B200 — 1.60x HBM + 1.12x dense compute, only 0.64x scale-up BW, so the residual 5x gap to B200 SGLang is software not silicon), and the upstream-to-main migration story via PR #24933. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n quality-benchmarks caption Reapplies four 'integration → performance optimization PR' wording fixes that the editor reverted, and adds an inline hyperlink from the quality-benchmarks figure source to the DeepSeek V4 preview announcement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Widen Figure caption prop from string to ReactNode so MDX captions can hold inline JSX (e.g. a source link). Convert the DSv4-Pro-Max vs Claude/GPT/Gemini caption to a JSX expression so the DeepSeek V4 preview release source renders as a real underlined external link instead of raw markdown syntax. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Revert the ReactNode widening of the Figure caption prop and convert the DSv4-Pro-Max vs Claude/GPT/Gemini caption back to a plain string. The source URL is now spelled out as readable text inside the caption. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 036d473. Configure here.
|
|
||
| | Interactivity (tok/s/user) | 04-25 | 05-02 | 05-04 | 05-10 | 05-21 | 05-02 → 05-21 | | ||
| | -------------------------- | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | | ||
| | 8 | _unreachable_ | 221 | 401 | 1,363 | _unreachable_ | _∞_ | |
There was a problem hiding this comment.
Iso-interactivity table starts with unreachable ratio row
Medium Severity
The iso-interactivity table's first row (8 tok/s/user) shows _unreachable_ for 05-21 and _∞_ in the ratio column. This happens because the 05-21 Pareto frontier ends at 9.37 tok/s/user (conc 256); conc 512 at 5.59 tok/s/user is dominated and excluded from the frontier. The project's SKILL.md row-pruning heuristic explicitly states "The first row of the table must have two real numbers — never start with an _unreachable_ row" and warns that "a table that opens with _∞_ reads like the data is missing." Starting at 10 tok/s/user — where both 05-02 and 05-21 have real values — avoids this.
Reviewed by Cursor Bugbot for commit 036d473. Configure here.
|
|
||
| <DashboardCTA href="https://inferencex.semianalysis.com/inference?g_rundate=2026-05-22&g_runid=26306422380&g_model=DeepSeek-V4-Pro&i_gpus=mi355x_sglang&i_dates=2026-05-03%2C2026-05-04%2C2026-05-08%2C2026-05-19%2C2026-05-21%2C2026-04-25&i_prec=fp4%2Cfp8&i_dstart=2026-04-25&i_dend=2026-05-21&i_linelabel=1"> | ||
| Click to see the full InferenceX dashboard → | ||
| </DashboardCTA> |
There was a problem hiding this comment.
Dashboard URL dates mismatch blog's discussed five dates
Medium Severity
The i_dates URL parameter decodes to six dates (04-25, 05-03, 05-04, 05-08, 05-19, 05-21), but the blog tables and figure captions reference five dates (04-25, 05-02, 05-04, 05-10, 05-21). Two dates are offset — blog says 05-02 but URL has 05-03, blog says 05-10 but URL has 05-08 — and 05-19 appears in the URL but is never discussed. The text at line 158 says "across the 5 measured dates" while the link shows 6 curves. Readers clicking the CTA will see different date labels and an extra undiscussed curve.
Additional Locations (2)
Reviewed by Cursor Bugbot for commit 036d473. Configure here.


Summary
/blog/mi355x-deepseek-v4-pro-sglang-110x-in-26-days). Time-series story of thesgl-project/sglang amd/deepseek_v4side branch: first-light 20.4 tok/s/GPU at 2.4 tok/s/user on 2026-04-25 (FP8, recipe neededSGLANG_HACK_FLASHMLA_BACKEND=torchto compile) → 2,256 tok/s/GPU at 9.4 tok/s/user on 2026-05-21 (FP4, DP attention,lmsysorg/sglang:v0.5.12-rocm720-mi35x). Both axes climb together: 110.5x throughput-per-GPU × 3.85x interactivity.rocm/sgl-dev→lmsysorg/sglang).packages/app/src/lib/gpu-specs.ts.iso_interactivity.pyhelper (monotone cubic Hermite on the upper-left Pareto frontier — matchesd3.curveMonotoneXexactly so the table values track the rendered chart, not a linear approximation). Unreachable cells render as_unreachable_.SGLANG_OPT_*toggles still side-branch-only. Hard reproducibility note: SGLangmainwill under-perform this post by an order of magnitude until those upstream.Test plan
pnpm devand visit/blog/mi355x-deepseek-v4-pro-sglang-110x-in-26-days— verify all 4 figures render in light + dark modes/bloglisting with correct title, subtitle, publish date (2026-05-26)/blog/mi355x-deepseek-v4-pro-sglang-110x-in-26-days🤖 Generated with Claude Code
Note
Low Risk
Content-only addition (MDX + referenced static images); no application logic, auth, or data-path changes.
Overview
Adds a new long-form benchmark article at
/blog/mi355x-deepseek-v4-pro-sglang-110x-in-26-daysdocumenting 26 days of MI355X SGLang tuning for DeepSeek-V4-Pro on theamd/deepseek_v4fork—from first-light ~20 tok/s/GPU (2026-04-25 FP8) to ~2,256 tok/s/GPU (2026-05-21 FP4)—with throughput and interactivity rising together.The post ties gains to 31 side-branch PRs (DSA/TileLang + Triton sparse MLA, mHC/RoPE/Hadamard fusions, FlyDSL MoE, FP4 path, InferenceX container bumps), publishes date-stamped concurrency tables, an iso-interactivity comparison, and MI355X vs B200 software-gap framing. It uses existing blog MDX patterns:
Figure(light/dark assets under/images/mi355x-deepseek-v4-pro-sglang-110x-in-26-days/),DashboardCTA+ live InferenceX links, andJsonLdFAQ schema for SEO.Reviewed by Cursor Bugbot for commit 036d473. Bugbot is set up for automated code reviews on this repo. Configure here.