[Deepseek V3.2] Change indexer weights_proj to fp32 #13459

hlu1 · 2025-11-17T22:56:20Z

Motivation

Following deepseek-ai/DeepSeek-V3.2-Exp@8631a81, change the indexer weights_proj precision from bf16 to fp32.

#13439 fixed the rope issue.

Modifications

change the indexer weights_proj precision from bf16 to fp32
The fuse_wk_and_weights_proj optimization is no longer valid after the fix because the wk precision is bf16 and weights_proj precision is fp32. Remove it from the code base.
Add aime 2025 accuracy results and commands to the dsv32 doc. Also added gsm8k 20 shots results. Accuracy results are tested together with [Bug] Fixes accuracy issues caused by incorrect use of rope #13439 on top of commit 6448b4c

Accuracy Tests

See updated doc in PR

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Work with maintainers to merge your PR. See the PR Merge Process

gemini-code-assist · 2025-11-17T22:56:33Z

Summary of Changes

Hello @hlu1, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the Deepseek V3.2 model's weights_proj component by elevating its precision to FP32. This change, driven by external model updates, necessitates the removal of a previously existing fusion optimization that relied on consistent BF16 precision. Additionally, the PR enriches the model's documentation with new accuracy benchmarks for AIME 2025 and GSM8K, providing updated performance insights.

Highlights

Precision Update: The weights_proj component within the NSA indexer has been updated from BF16 to FP32 precision, aligning with recent Deepseek V3.2-Exp changes.
Optimization Removal: The fuse_wk_and_weights_proj optimization has been removed from the codebase. This is due to the precision mismatch between wk (BF16) and the newly FP32 weights_proj, rendering the fusion invalid.
Documentation Enhancement: The documentation for Deepseek V3.2 has been updated to include new accuracy results and commands for AIME 2025 and GSM8K (20-shot) benchmarks.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request changes the precision of the weights_proj layer in the NSA indexer to fp32 from bf16. This change aligns with an update in the original DeepSeek-V3.2 repository. As a consequence, the fuse_wk_and_weights_proj optimization is no longer valid due to a type mismatch and has been cleanly removed from the codebase. The necessary input casting to float() for the weights_proj layer has been correctly added. Additionally, the documentation is updated with new accuracy results for gsm8k and aime 2025. My review includes a couple of minor suggestions for the documentation to improve clarity and fix a typo.

docs/basic_usage/deepseek_v32.md

hlu1 · 2025-11-17T23:25:38Z

cc @Paiiiiiiiiiiiiii

trevor-m · 2025-11-18T16:42:10Z

python/sglang/srt/layers/attention/nsa/nsa_indexer.py


-        if not self.fuse_wk_and_weights_proj:
-            weights, _ = self.weights_proj(x)
+        weights, _ = self.weights_proj(x.float())


I believe this gemm was orignally in the _get_logits_head_gate() function.

hlu1 requested review from BBuf, Edwardf0t1, Fridge003, HaiShaw, Ying1123, ch-wan, ispobock, kushanam and merrymercy as code owners November 17, 2025 22:56

github-actions bot added documentation Improvements or additions to documentation deepseek labels Nov 17, 2025

sglang-bot added the run-ci label Nov 17, 2025

hlu1 requested a review from trevor-m November 17, 2025 22:56

hlu1 force-pushed the weight_proj_fp32 branch from c8c3632 to 7d23e01 Compare November 17, 2025 22:59

gemini-code-assist bot reviewed Nov 17, 2025

View reviewed changes

docs/basic_usage/deepseek_v32.md Show resolved Hide resolved

docs/basic_usage/deepseek_v32.md Outdated Show resolved Hide resolved

hlu1 force-pushed the weight_proj_fp32 branch 2 times, most recently from 5aa0046 to b5a0096 Compare November 17, 2025 23:24

hlu1 force-pushed the weight_proj_fp32 branch from b5a0096 to d7564d3 Compare November 17, 2025 23:29

[Deepseek V3.2] Change indexer weights_proj to fp32

11c7304

hlu1 force-pushed the weight_proj_fp32 branch from d7564d3 to 11c7304 Compare November 17, 2025 23:31

Merge branch 'main' into weight_proj_fp32

a4b661c

Fridge003 removed the run-ci label Nov 17, 2025

trevor-m reviewed Nov 18, 2025

View reviewed changes

Fridge003 mentioned this pull request Nov 18, 2025

[Tracking] Deepseek-V3.2-Exp bugs #12800

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Deepseek V3.2] Change indexer weights_proj to fp32 #13459

[Deepseek V3.2] Change indexer weights_proj to fp32 #13459

hlu1 commented Nov 17, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Nov 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

hlu1 commented Nov 17, 2025

Uh oh!

trevor-m Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Deepseek V3.2] Change indexer weights_proj to fp32 #13459

Are you sure you want to change the base?

[Deepseek V3.2] Change indexer weights_proj to fp32 #13459

Conversation

hlu1 commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Nov 17, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

hlu1 commented Nov 17, 2025

Uh oh!

trevor-m Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

hlu1 commented Nov 17, 2025 •

edited

Loading