Skip to content

fix: load Qwen 3.5 checkpoint with unfused experts#1317

Open
lawrence-harmonic wants to merge 1 commit into
radixark:mainfrom
lawrence-harmonic:fix/gate_proj_up_proj
Open

fix: load Qwen 3.5 checkpoint with unfused experts#1317
lawrence-harmonic wants to merge 1 commit into
radixark:mainfrom
lawrence-harmonic:fix/gate_proj_up_proj

Conversation

@lawrence-harmonic

Copy link
Copy Markdown
Contributor

For some reason on HuggingFace the fp8 checkpoint has unfused gate_proj and up_proj:
https://huggingface.co/Qwen/Qwen3.5-35B-A3B-FP8/blob/main/model.safetensors.index.json

which is different from the bf16 checkpoint, which has fused gate_up_proj:
https://huggingface.co/Qwen/Qwen3.5-35B-A3B/blob/main/model.safetensors.index.json

Current miles code handles fused gate_up_proj only.

For some reason on HuggingFace the fp8 checkpoint has unfused
`gate_proj` and `up_proj`:
https://huggingface.co/Qwen/Qwen3.5-35B-A3B-FP8/blob/main/model.safetensors.index.json

which is different from the bf16 checkpoint, which has fused
`gate_up_proj`:
https://huggingface.co/Qwen/Qwen3.5-35B-A3B/blob/main/model.safetensors.index.json

Current miles code handles fused `gate_up_proj` only.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the Qwen3_5Bridge to support both fused and unfused regular MoE expert weight formats. It introduces dynamic detection of the expert format from the safetensor index, mapping the weights accordingly using either _MLP_EXPERT_MAPPING_FUSED or _MLP_EXPERT_MAPPING_UNFUSED. Additionally, the test suite has been expanded to verify the mapping and loading behavior for both formats. There are no review comments, and I have no feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant