feat(blog): GB300 vs GB200 NVL72 on DSv4-Pro — up to 2.83x throughput/GPU by functionstackx · Pull Request #391 · SemiAnalysisAI/InferenceX-app

functionstackx · 2026-05-26T17:40:17Z

Summary

New blog post: GB300 NVL72 vs GB200 NVL72 on DeepSeek-V4-Pro 1.6T, Dynamo+vLLM FP4 8K/1K, disaggregated on both racks.
Headline: 2.83x throughput per GPU peak at 27 tok/s/user; 2.31x perf/$ at the same operating point after GB300's 20% TCO premium ($2.65 vs $2.21/GPU/hr).
Framing: identical NVLink fabric, identical 8 TB/s HBM BW, identical 72-GPU world size — the only meaningful silicon delta is GB300's 1.5x HBM capacity (288 vs 192 GB/GPU) and 1.5x FP4. That HBM headroom is what unlocks a wider prefill+decode recipe (conc=3072, 28-GPU prefill, 32-GPU decode EP=16, 6,812 tok/s/GPU at 25.9 tok/s/user) that GB200 has no equivalent for in the 22–32 tok/s/user band.
Honest gap-inversion: at peak throughput (13–18 tok/s/user) the perf/$ ratio collapses to ~1.0x because both racks run narrow EP=8 recipes; the headline lift is structural to the middle of the curve, not the entire frontier.
Numbers verified against latest_benchmarks via InferenceX MCP (every row in the per-conc tables maps 1:1 to a CSV row from the 2026-05-22 run, GHA 26306422380).
Iso-interactivity table interpolated using the bundled .claude/skills/write-inferencex-blog/iso_interactivity.py helper so the numbers match the live dashboard chart.
Model architecture cited from the HF model card config.json: 1.6T total / 49B active, 384 routed experts + 1 shared, 6 active per token, 61 layers.

Test plan

Visit /blog/gb300-nvl72-vs-gb200-nvl72-dsv4-pro-vllm-fp4 on the Vercel preview and verify both benchmark-{light,dark}.png and specs-radar-{light,dark}.png render
Verify the DashboardCTA links land on the right pre-filtered comparison
Verify the post shows up in /blog, /feed.xml, /llms.txt, and /sitemap.xml
OG image generates for the post URL

🤖 Generated with Claude Code

Note

Low Risk
Editorial MDX and a documentation link change only; no runtime, auth, or data-path changes. Review should focus on numeric claims and external links, not production risk.

Overview
Adds a new InferenceX blog post comparing GB300 NVL72 vs GB200 NVL72 on DeepSeek-V4-Pro (Dynamo+vLLM, FP4 8K/1K, disaggregated), with headline 2.83× throughput/GPU and 2.31× perf/$ at 27 tok/s/user, framed as HBM headroom unlocking wider prefill/decode recipes in the 22–32 tok/s/user band. The post includes dashboard CTAs, benchmark/specs figure paths, per-concurrency tables, an iso-interactivity table, acknowledgments, and FAQ JSON-LD.

Also updates the write-inferencex-blog skill so the canonical SemiAnalysis AI Cloud TCO Model link points to https://semianalysis.com/ai-cloud-tco-model/ instead of the newsletter URL.

^{Reviewed by Cursor Bugbot for commit f3a04a8. Bugbot is set up for automated code reviews on this repo. Configure here.}

vercel · 2026-05-26T17:40:23Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
inferencemax-app	Ready	Preview, Comment	May 27, 2026 1:29am

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 1448436. Configure here.}

…/GPU Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel Bot deployed to Preview May 26, 2026 17:40 View deployment

vercel Bot deployed to Preview May 26, 2026 17:43 View deployment

cursor Bot reviewed May 26, 2026

View reviewed changes

Comment thread packages/app/content/blog/gb300-nvl72-vs-gb200-nvl72-dsv4-pro-vllm-fp4.mdx

functionstackx and others added 3 commits May 26, 2026 21:28

feat(blog): GB300 vs GB200 NVL72 on DSv4-Pro — up to 2.83x throughput…

b6495b3

…/GPU Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(blog): use semianalysis.com/ai-cloud-tco-model link for TCO citation

6d74b13

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(blog): update date to 2026-05-27

f3a04a8

functionstackx force-pushed the blog/gb300-nvl72-vs-gb200-nvl72-dsv4-pro-vllm-fp4 branch from 1448436 to f3a04a8 Compare May 27, 2026 01:28

vercel Bot deployed to Preview May 27, 2026 01:29 View deployment

functionstackx merged commit fe1e20b into master May 27, 2026
18 checks passed

functionstackx deleted the blog/gb300-nvl72-vs-gb200-nvl72-dsv4-pro-vllm-fp4 branch May 27, 2026 01:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(blog): GB300 vs GB200 NVL72 on DSv4-Pro — up to 2.83x throughput/GPU#391

feat(blog): GB300 vs GB200 NVL72 on DSv4-Pro — up to 2.83x throughput/GPU#391
functionstackx merged 3 commits into
masterfrom
blog/gb300-nvl72-vs-gb200-nvl72-dsv4-pro-vllm-fp4

functionstackx commented May 26, 2026 •

edited by cursor Bot

Loading

Uh oh!

vercel Bot commented May 26, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

functionstackx commented May 26, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

vercel Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

functionstackx commented May 26, 2026 •

edited by cursor Bot

Loading

vercel Bot commented May 26, 2026 •

edited

Loading