feat(blog): MI355X DeepSeek-V4-Pro SGLang — 110.5x throughput per GPU in 26 days by functionstackx · Pull Request #388 · SemiAnalysisAI/InferenceX-app

functionstackx · 2026-05-26T04:12:52Z

Summary

New blog post: MI355X DeepSeek-V4-Pro on SGLang — 110.5x throughput per GPU in 26 days (/blog/mi355x-deepseek-v4-pro-sglang-110x-in-26-days). Time-series story of the sgl-project/sglang amd/deepseek_v4 side branch: first-light 20.4 tok/s/GPU at 2.4 tok/s/user on 2026-04-25 (FP8, recipe needed SGLANG_HACK_FLASHMLA_BACKEND=torch to compile) → 2,256 tok/s/GPU at 9.4 tok/s/user on 2026-05-21 (FP4, DP attention, lmsysorg/sglang:v0.5.12-rocm720-mi35x). Both axes climb together: 110.5x throughput-per-GPU × 3.85x interactivity.
Decomposes the 31 numbered side-branch PRs into 5 mechanism buckets — DSA attention (TileLang indexer + Triton sparse MLA), mHC fusion, RoPE+Hadamard fusion, MoE (FlyDSL + FP4 + fused hash topk), AITER + misc fusions — and traces each container-image bump in the InferenceX recipe loop (rocm/sgl-dev → lmsysorg/sglang).
Adds 9 files: the MDX post + 4 light/dark image pairs (benchmark chart, DSv4-Pro-Max vs Claude Opus 4.6 / GPT-5.4 / Gemini 3.1 quality bars, MI355X-vs-B200 SXM /gpu-specs radar, B200-vs-MI355X SGLang FP4 performance curve).
On-Paper Specs section in What's Next anchors the residual ~5x gap to B200 SGLang as software, not silicon: MI355X has 1.60x HBM capacity, the same 8 TB/s HBM BW, and 1.12x dense FP4 / FP8 / BF16 vs B200 SXM; only scale-up BW (576 vs 900 GB/s uni-di, 0.64x) favors B200. Pulled from packages/app/src/lib/gpu-specs.ts.
Iso-interactivity table uses the bundled iso_interactivity.py helper (monotone cubic Hermite on the upper-left Pareto frontier — matches d3.curveMonotoneX exactly so the table values track the rendered chart, not a linear approximation). Unreachable cells render as _unreachable_.
What's Next explicitly flags the upstream-to-main migration story via PR #24933 (kk, merged 2026-05-18, +3,678 / -70) — first chunk landing DSv4 on ROCm in main in eager mode, with the perf-critical fusions / TileLang indexer / FlyDSL MoE / SGLANG_OPT_* toggles still side-branch-only. Hard reproducibility note: SGLang main will under-perform this post by an order of magnitude until those upstream.
FAQ JSON-LD covers the 5 questions readers actually ask: peak number, what shipped, why 04-25 was so slow, NVIDIA comparison, what's still uncovered.

Test plan

pnpm dev and visit /blog/mi355x-deepseek-v4-pro-sglang-110x-in-26-days — verify all 4 figures render in light + dark modes
Post appears in /blog listing with correct title, subtitle, publish date (2026-05-26)
OG image renders correctly for /blog/mi355x-deepseek-v4-pro-sglang-110x-in-26-days
DashboardCTA at top + bottom links land on the preset MI355X DSv4-Pro 5-date view on inferencex.semianalysis.com
Live chart link inside body opens the same preset view
Sitemap / RSS feed / llms.txt include the new post
All 31 SGLang PR links + PR #24933 + DeepSeek announcement + SemiAnalysis tweet resolve

🤖 Generated with Claude Code

Note

Low Risk
Content-only addition (MDX + referenced static images); no application logic, auth, or data-path changes.

Overview
Adds a new long-form benchmark article at /blog/mi355x-deepseek-v4-pro-sglang-110x-in-26-days documenting 26 days of MI355X SGLang tuning for DeepSeek-V4-Pro on the amd/deepseek_v4 fork—from first-light ~20 tok/s/GPU (2026-04-25 FP8) to ~2,256 tok/s/GPU (2026-05-21 FP4)—with throughput and interactivity rising together.

The post ties gains to 31 side-branch PRs (DSA/TileLang + Triton sparse MLA, mHC/RoPE/Hadamard fusions, FlyDSL MoE, FP4 path, InferenceX container bumps), publishes date-stamped concurrency tables, an iso-interactivity comparison, and MI355X vs B200 software-gap framing. It uses existing blog MDX patterns: Figure (light/dark assets under /images/mi355x-deepseek-v4-pro-sglang-110x-in-26-days/), DashboardCTA + live InferenceX links, and JsonLd FAQ schema for SEO.

^{Reviewed by Cursor Bugbot for commit 036d473. Bugbot is set up for automated code reviews on this repo. Configure here.}

… in 26 days Time-series story of the sgl-project/sglang amd/deepseek_v4 side branch: 20.4 → 2,256 tok/s/GPU at iso/improving interactivity on 8K/1K, across 31 numbered performance optimization PRs. Includes per-date tables, iso-iv interpolation (12-14x in the 10-20 tok/s/user serving band from FP4 first light to 05-21), DSv4-Pro vs Claude/GPT/Gemini quality benchmarks, on-paper specs framing (MI355X vs B200 — 1.60x HBM + 1.12x dense compute, only 0.64x scale-up BW, so the residual 5x gap to B200 SGLang is software not silicon), and the upstream-to-main migration story via PR #24933. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel · 2026-05-26T04:12:58Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
inferencemax-app	Ready	Preview, Comment	May 26, 2026 4:21am

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…n quality-benchmarks caption Reapplies four 'integration → performance optimization PR' wording fixes that the editor reverted, and adds an inline hyperlink from the quality-benchmarks figure source to the DeepSeek V4 preview announcement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Widen Figure caption prop from string to ReactNode so MDX captions can hold inline JSX (e.g. a source link). Convert the DSv4-Pro-Max vs Claude/GPT/Gemini caption to a JSX expression so the DeepSeek V4 preview release source renders as a real underlined external link instead of raw markdown syntax. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Revert the ReactNode widening of the Figure caption prop and convert the DSv4-Pro-Max vs Claude/GPT/Gemini caption back to a plain string. The source URL is now spelled out as readable text inside the caption. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 036d473. Configure here.}

cursor · 2026-05-26T04:29:48Z

+
+| Interactivity (tok/s/user) | 04-25         | 05-02         | 05-04         | 05-10         | 05-21         | 05-02 → 05-21 |
+| -------------------------- | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- |
+| 8                          | _unreachable_ | 221           | 401           | 1,363         | _unreachable_ | _∞_           |


Iso-interactivity table starts with unreachable ratio row

Medium Severity

The iso-interactivity table's first row (8 tok/s/user) shows _unreachable_ for 05-21 and _∞_ in the ratio column. This happens because the 05-21 Pareto frontier ends at 9.37 tok/s/user (conc 256); conc 512 at 5.59 tok/s/user is dominated and excluded from the frontier. The project's SKILL.md row-pruning heuristic explicitly states "The first row of the table must have two real numbers — never start with an _unreachable_ row" and warns that "a table that opens with _∞_ reads like the data is missing." Starting at 10 tok/s/user — where both 05-02 and 05-21 have real values — avoids this.

^{Reviewed by Cursor Bugbot for commit 036d473. Configure here.}

cursor · 2026-05-26T04:29:48Z

+
+<DashboardCTA href="https://inferencex.semianalysis.com/inference?g_rundate=2026-05-22&g_runid=26306422380&g_model=DeepSeek-V4-Pro&i_gpus=mi355x_sglang&i_dates=2026-05-03%2C2026-05-04%2C2026-05-08%2C2026-05-19%2C2026-05-21%2C2026-04-25&i_prec=fp4%2Cfp8&i_dstart=2026-04-25&i_dend=2026-05-21&i_linelabel=1">
+  Click to see the full InferenceX dashboard →
+</DashboardCTA>


Dashboard URL dates mismatch blog's discussed five dates

Medium Severity

The i_dates URL parameter decodes to six dates (04-25, 05-03, 05-04, 05-08, 05-19, 05-21), but the blog tables and figure captions reference five dates (04-25, 05-02, 05-04, 05-10, 05-21). Two dates are offset — blog says 05-02 but URL has 05-03, blog says 05-10 but URL has 05-08 — and 05-19 appears in the URL but is never discussed. The text at line 158 says "across the 5 measured dates" while the link shows 6 curves. Readers clicking the CTA will see different date labels and an extra undiscussed curve.

Additional Locations (2)

packages/app/content/blog/mi355x-deepseek-v4-pro-sglang-110x-in-26-days.mdx#L157-L158

packages/app/content/blog/mi355x-deepseek-v4-pro-sglang-110x-in-26-days.mdx#L204-L207

^{Reviewed by Cursor Bugbot for commit 036d473. Configure here.}

vercel Bot deployed to Preview May 26, 2026 04:13 View deployment

fix(blog): say 'performance optimization PR' instead of 'integration PR'

5e57afe

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel Bot deployed to Preview May 26, 2026 04:14 View deployment

vercel Bot deployed to Preview May 26, 2026 04:16 View deployment

vercel Bot deployed to Preview May 26, 2026 04:19 View deployment

vercel Bot deployed to Preview May 26, 2026 04:21 View deployment

functionstackx merged commit 19ae49e into master May 26, 2026
18 checks passed

functionstackx deleted the feat/blog-mi355x-deepseek-v4-pro-sglang-110x branch May 26, 2026 04:26

cursor Bot reviewed May 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(blog): MI355X DeepSeek-V4-Pro SGLang — 110.5x throughput per GPU in 26 days#388

feat(blog): MI355X DeepSeek-V4-Pro SGLang — 110.5x throughput per GPU in 26 days#388
functionstackx merged 5 commits into
masterfrom
feat/blog-mi355x-deepseek-v4-pro-sglang-110x

functionstackx commented May 26, 2026 •

edited by cursor Bot

Loading

Uh oh!

vercel Bot commented May 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 26, 2026

Uh oh!

cursor Bot May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

functionstackx commented May 26, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

vercel Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 26, 2026

Choose a reason for hiding this comment

Iso-interactivity table starts with unreachable ratio row

Uh oh!

cursor Bot May 26, 2026

Choose a reason for hiding this comment

Dashboard URL dates mismatch blog's discussed five dates

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

functionstackx commented May 26, 2026 •

edited by cursor Bot

Loading

vercel Bot commented May 26, 2026 •

edited

Loading