fix(mtp): track megatron mtp_model_layer rename in raw converters by Zhichenzzz · Pull Request #1307 · radixark/miles

Zhichenzzz · 2026-06-08T21:28:13Z

Context

Paired with radixark/Megatron-LM#54, which renames the MTP inner module transformer_layer → mtp_model_layer to match megatron-bridge/upstream. That changes the live megatron param names.

Fix

Update the raw (non-bridge) megatron→hf converters and the weight iterator that key on the old name, so live param names still match:

megatron_to_hf/qwen3_5.py — MTP branch
megatron_to_hf/qwen3_next.py — MTP branch
update_weight/common.py — MTP expert regex + yield

⚠️ Must merge together with radixark/Megatron-LM#54

The fork rename and these converter updates are interdependent — merging only one breaks raw conversion. Land both in the same window.

Fixes #1289

Update: dual-name support + full converter coverage

The converters now accept both mtp_model_layer (new) and transformer_layer (old) — the same dual-name approach as Megatron-Bridge's glm45_bridge — so miles works regardless of which Megatron-LM version is paired. Also covers the previously-missed sites: miles_plugins/mbridge/qwen3_5.py, qwen3_next.py, glm4moe_lite.py; the mapping tests now parametrize over both names (10/10 pass).

e2e validated (Qwen3.5-35B-A3B raw mode, MTP enabled, renamed Megatron): update_weights completes with 0 unknown-parameter, sglang rollout generates normally with the synced weights.

Companion PRs: radixark/Megatron-LM#54, radixark/Megatron-Bridge#11.

gemini-code-assist

Code Review

This pull request updates the weight conversion and mapping logic for Qwen models by renaming references to transformer_layer to mtp_model_layer across multiple files, including qwen3_5.py, qwen3_next.py, and common.py. I have no feedback to provide as there are no review comments.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Megatron-LM renamed the MTP submodule transformer_layer -> mtp_model_layer. Make every miles MTP weight converter (megatron_to_hf, update_weight, and the mbridge qwen3_5 / qwen3_next / glm4moe_lite bridges) accept both the new and old submodule names and re-emit whichever the running Megatron uses, so weight sync keeps working across the rename. Parametrize the bridge-mapping tests over both names.

Zhichenzzz requested review from fzyzcjy, maocheng23, yueming-yuan and yushengsu-thu as code owners June 8, 2026 21:28

gemini-code-assist Bot reviewed Jun 8, 2026

View reviewed changes

This was referenced Jun 10, 2026

[model] fix: register both MTP submodule spellings in qwen3_next_bridge radixark/Megatron-Bridge#11

Open

fix(mtp): rename MTP submodule transformer_layer -> mtp_model_layer radixark/Megatron-LM#54

Open

Zhichenzzz force-pushed the fix/1289-mtp-naming branch from 8ce4539 to 41f84a6 Compare June 18, 2026 21:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(mtp): track megatron mtp_model_layer rename in raw converters#1307

fix(mtp): track megatron mtp_model_layer rename in raw converters#1307
Zhichenzzz wants to merge 1 commit into
mainfrom
fix/1289-mtp-naming

Zhichenzzz commented Jun 8, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Zhichenzzz commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Fix

⚠️ Must merge together with radixark/Megatron-LM#54

Update: dual-name support + full converter coverage

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Zhichenzzz commented Jun 8, 2026 •

edited

Loading