fix(mtp): track megatron mtp_model_layer rename in raw converters#1307
Open
Zhichenzzz wants to merge 1 commit into
Open
fix(mtp): track megatron mtp_model_layer rename in raw converters#1307Zhichenzzz wants to merge 1 commit into
Zhichenzzz wants to merge 1 commit into
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request updates the weight conversion and mapping logic for Qwen models by renaming references to transformer_layer to mtp_model_layer across multiple files, including qwen3_5.py, qwen3_next.py, and common.py. I have no feedback to provide as there are no review comments.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Megatron-LM renamed the MTP submodule transformer_layer -> mtp_model_layer. Make every miles MTP weight converter (megatron_to_hf, update_weight, and the mbridge qwen3_5 / qwen3_next / glm4moe_lite bridges) accept both the new and old submodule names and re-emit whichever the running Megatron uses, so weight sync keeps working across the rename. Parametrize the bridge-mapping tests over both names.
8ce4539 to
41f84a6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
Paired with radixark/Megatron-LM#54, which renames the MTP inner module
transformer_layer→mtp_model_layerto match megatron-bridge/upstream. That changes the live megatron param names.Fix
Update the raw (non-bridge) megatron→hf converters and the weight iterator that key on the old name, so live param names still match:
megatron_to_hf/qwen3_5.py— MTP branchmegatron_to_hf/qwen3_next.py— MTP branchupdate_weight/common.py— MTP expert regex + yieldThe fork rename and these converter updates are interdependent — merging only one breaks raw conversion. Land both in the same window.
Fixes #1289
Update: dual-name support + full converter coverage
The converters now accept both
mtp_model_layer(new) andtransformer_layer(old) — the same dual-name approach as Megatron-Bridge'sglm45_bridge— so miles works regardless of which Megatron-LM version is paired. Also covers the previously-missed sites:miles_plugins/mbridge/qwen3_5.py,qwen3_next.py,glm4moe_lite.py; the mapping tests now parametrize over both names (10/10 pass).e2e validated (Qwen3.5-35B-A3B raw mode, MTP enabled, renamed Megatron):
update_weightscompletes with 0 unknown-parameter, sglang rollout generates normally with the synced weights.Companion PRs: radixark/Megatron-LM#54, radixark/Megatron-Bridge#11.