Skip to content

fix(mtp): track megatron mtp_model_layer rename in raw converters#1307

Open
Zhichenzzz wants to merge 1 commit into
mainfrom
fix/1289-mtp-naming
Open

fix(mtp): track megatron mtp_model_layer rename in raw converters#1307
Zhichenzzz wants to merge 1 commit into
mainfrom
fix/1289-mtp-naming

Conversation

@Zhichenzzz

@Zhichenzzz Zhichenzzz commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Context

Paired with radixark/Megatron-LM#54, which renames the MTP inner module transformer_layermtp_model_layer to match megatron-bridge/upstream. That changes the live megatron param names.

Fix

Update the raw (non-bridge) megatron→hf converters and the weight iterator that key on the old name, so live param names still match:

  • megatron_to_hf/qwen3_5.py — MTP branch
  • megatron_to_hf/qwen3_next.py — MTP branch
  • update_weight/common.py — MTP expert regex + yield

⚠️ Must merge together with radixark/Megatron-LM#54

The fork rename and these converter updates are interdependent — merging only one breaks raw conversion. Land both in the same window.

Fixes #1289


Update: dual-name support + full converter coverage

The converters now accept both mtp_model_layer (new) and transformer_layer (old) — the same dual-name approach as Megatron-Bridge's glm45_bridge — so miles works regardless of which Megatron-LM version is paired. Also covers the previously-missed sites: miles_plugins/mbridge/qwen3_5.py, qwen3_next.py, glm4moe_lite.py; the mapping tests now parametrize over both names (10/10 pass).

e2e validated (Qwen3.5-35B-A3B raw mode, MTP enabled, renamed Megatron): update_weights completes with 0 unknown-parameter, sglang rollout generates normally with the synced weights.

Companion PRs: radixark/Megatron-LM#54, radixark/Megatron-Bridge#11.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the weight conversion and mapping logic for Qwen models by renaming references to transformer_layer to mtp_model_layer across multiple files, including qwen3_5.py, qwen3_next.py, and common.py. I have no feedback to provide as there are no review comments.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Megatron-LM renamed the MTP submodule transformer_layer -> mtp_model_layer.
Make every miles MTP weight converter (megatron_to_hf, update_weight, and the
mbridge qwen3_5 / qwen3_next / glm4moe_lite bridges) accept both the new and
old submodule names and re-emit whichever the running Megatron uses, so weight
sync keeps working across the rename. Parametrize the bridge-mapping tests over
both names.
@Zhichenzzz Zhichenzzz force-pushed the fix/1289-mtp-naming branch from 8ce4539 to 41f84a6 Compare June 18, 2026 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MTP naming mismatch: Megatron-LM fork uses transformer_layer but megatron-bridge expects mtp_model_layer (breaks Qwen3.6-27B GDN weight conversion)

1 participant