Skip to content

feat(blog): GB300 vs GB200 NVL72 on DSv4-Pro — up to 2.83x throughput/GPU#391

Merged
functionstackx merged 3 commits into
masterfrom
blog/gb300-nvl72-vs-gb200-nvl72-dsv4-pro-vllm-fp4
May 27, 2026
Merged

feat(blog): GB300 vs GB200 NVL72 on DSv4-Pro — up to 2.83x throughput/GPU#391
functionstackx merged 3 commits into
masterfrom
blog/gb300-nvl72-vs-gb200-nvl72-dsv4-pro-vllm-fp4

Conversation

@functionstackx
Copy link
Copy Markdown
Contributor

@functionstackx functionstackx commented May 26, 2026

Summary

  • New blog post: GB300 NVL72 vs GB200 NVL72 on DeepSeek-V4-Pro 1.6T, Dynamo+vLLM FP4 8K/1K, disaggregated on both racks.
  • Headline: 2.83x throughput per GPU peak at 27 tok/s/user; 2.31x perf/$ at the same operating point after GB300's 20% TCO premium ($2.65 vs $2.21/GPU/hr).
  • Framing: identical NVLink fabric, identical 8 TB/s HBM BW, identical 72-GPU world size — the only meaningful silicon delta is GB300's 1.5x HBM capacity (288 vs 192 GB/GPU) and 1.5x FP4. That HBM headroom is what unlocks a wider prefill+decode recipe (conc=3072, 28-GPU prefill, 32-GPU decode EP=16, 6,812 tok/s/GPU at 25.9 tok/s/user) that GB200 has no equivalent for in the 22–32 tok/s/user band.
  • Honest gap-inversion: at peak throughput (13–18 tok/s/user) the perf/$ ratio collapses to ~1.0x because both racks run narrow EP=8 recipes; the headline lift is structural to the middle of the curve, not the entire frontier.
  • Numbers verified against latest_benchmarks via InferenceX MCP (every row in the per-conc tables maps 1:1 to a CSV row from the 2026-05-22 run, GHA 26306422380).
  • Iso-interactivity table interpolated using the bundled .claude/skills/write-inferencex-blog/iso_interactivity.py helper so the numbers match the live dashboard chart.
  • Model architecture cited from the HF model card config.json: 1.6T total / 49B active, 384 routed experts + 1 shared, 6 active per token, 61 layers.

Test plan

  • Visit /blog/gb300-nvl72-vs-gb200-nvl72-dsv4-pro-vllm-fp4 on the Vercel preview and verify both benchmark-{light,dark}.png and specs-radar-{light,dark}.png render
  • Verify the DashboardCTA links land on the right pre-filtered comparison
  • Verify the post shows up in /blog, /feed.xml, /llms.txt, and /sitemap.xml
  • OG image generates for the post URL

🤖 Generated with Claude Code


Note

Low Risk
Editorial MDX and a documentation link change only; no runtime, auth, or data-path changes. Review should focus on numeric claims and external links, not production risk.

Overview
Adds a new InferenceX blog post comparing GB300 NVL72 vs GB200 NVL72 on DeepSeek-V4-Pro (Dynamo+vLLM, FP4 8K/1K, disaggregated), with headline 2.83× throughput/GPU and 2.31× perf/$ at 27 tok/s/user, framed as HBM headroom unlocking wider prefill/decode recipes in the 22–32 tok/s/user band. The post includes dashboard CTAs, benchmark/specs figure paths, per-concurrency tables, an iso-interactivity table, acknowledgments, and FAQ JSON-LD.

Also updates the write-inferencex-blog skill so the canonical SemiAnalysis AI Cloud TCO Model link points to https://semianalysis.com/ai-cloud-tco-model/ instead of the newsletter URL.

Reviewed by Cursor Bugbot for commit f3a04a8. Bugbot is set up for automated code reviews on this repo. Configure here.

@vercel
Copy link
Copy Markdown

vercel Bot commented May 26, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
inferencemax-app Ready Ready Preview, Comment May 27, 2026 1:29am

Request Review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 1448436. Configure here.

functionstackx and others added 3 commits May 26, 2026 21:28
…/GPU

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@functionstackx functionstackx force-pushed the blog/gb300-nvl72-vs-gb200-nvl72-dsv4-pro-vllm-fp4 branch from 1448436 to f3a04a8 Compare May 27, 2026 01:28
@functionstackx functionstackx merged commit fe1e20b into master May 27, 2026
18 checks passed
@functionstackx functionstackx deleted the blog/gb300-nvl72-vs-gb200-nvl72-dsv4-pro-vllm-fp4 branch May 27, 2026 01:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant