[`Ernie 4.5`] Ernie VL models #39585

vasqu · 2025-07-22T15:45:07Z

Continuation of #39228 for the VL models

Current inference script for testing (torch 2.6):

import requests
from PIL import Image

from transformers import AutoModelForImageTextToText, AutoProcessor


use_fast = False
model_path = "<your_converted_path>"

processor_kwargs = {} if not use_fast else {"use_fast": True}
processor = AutoProcessor.from_pretrained(model_path, **processor_kwargs)

model = AutoModelForImageTextToText.from_pretrained(
    model_path,
    device_map="auto",
    dtype="auto",
)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Only use English during your responses and describe the following image."},
            {"type": "image"},
        ]
    },
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image = Image.open(requests.get("https://paddlenlp.bj.bcebos.com/datasets/paddlemix/demo_images/example1.jpg", stream=True).raw)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **inputs,
    max_new_tokens=64,
    do_sample=False,
)
print(processor.decode(generated_ids[0][len(inputs['input_ids'][0]):]))

Output:
The image features a person sitting on a hilltop, gazing out at a vast mountain range. The person is wrapped in a colorful, striped blanket, and their head is covered with a red headscarf. The foreground includes vibrant pink flowers, adding a pop of color to the scene. The background show

Left TODOs:

…essor for now

…original formula (torch.allclose always True) leading to slightly different generations

vasqu · 2025-10-30T18:34:49Z

There are some TODOs left, but the core parts are finished so a general (first) review would be nice

src/transformers/models/auto/modeling_auto.py

src/transformers/models/ernie4_5_vl/configuration_ernie4_5_vl.py

src/transformers/models/ernie4_5_vl/video_processing_ernie4_5_vl.py

molbap

Thanks for the huge work! Left a review after reading that of @zucchini-nlp

docs/source/en/model_doc/ernie4_5_vl.md

molbap · 2025-11-12T14:27:31Z

docs/source/en/model_doc/ernie4_5_vl.md

+
+## Overview
+
+The ernie4_5_vl model was proposed in [<INSERT PAPER NAME HERE>](<INSERT PAPER LINK HERE>) by <INSERT AUTHORS HERE>.


To complete with https://arxiv.org/abs/2510.14528 I suppose

Yea keeping one comment open here, I have a TODO note in the PR description 👍

docs/source/en/model_doc/ernie4_5_vl.md

src/transformers/modeling_rope_utils.py

src/transformers/models/ernie4_5_moe/modeling_ernie4_5_moe.py

src/transformers/models/ernie4_5_vl/modular_ernie4_5_vl.py

src/transformers/models/ernie4_5_vl/video_processing_ernie4_5_vl.py

Co-authored-by: Pablo Montalvo <[email protected]>

molbap

github has been a bit peculiar, been struggling with the new interface. Left very few comments, most of it is + what was already mentioned by Raushan is doable now that the weight loader #41580 is merged. We can either merge now and use the converter later (by its nature, the order does not matter, we won't break BC), or merge later once the changes from converter are merged here, however you prefer

molbap · 2025-11-13T14:59:44Z

src/transformers/models/ernie4_5_vl/convert_ernie4_5_vl_to_hf.py

+    # convert_weights(args.checkpoint_path, args.pytorch_dump_folder_path)
+    convert_config(args.checkpoint_path, args.pytorch_dump_folder_path)
+
+    # if args.convert_preprocessor:
+    #    convert_processor(args.checkpoint_path, args.pytorch_dump_folder_path)


just cleanup

Suggested change

# convert_weights(args.checkpoint_path, args.pytorch_dump_folder_path)

convert_config(args.checkpoint_path, args.pytorch_dump_folder_path)

# if args.convert_preprocessor:

# convert_processor(args.checkpoint_path, args.pytorch_dump_folder_path)

convert_config(args.checkpoint_path, args.pytorch_dump_folder_path)

The processor will also stay but yea, playing around the conversion too much and I'm too lazy so modifying what I need at certain times 😄 will keep this open tho, not to forget

src/transformers/models/ernie4_5_vl/modular_ernie4_5_vl.py

src/transformers/models/ernie4_5_vl/video_processing_ernie4_5_vl.py

utils/check_repo.py

tests/models/ernie4_5_vl/test_modeling_ernie4_5_vl.py

tests/models/ernie4_5_vl/test_processing_ernie4_5_vl.py

…sed on font name with expected associated file at same repo)

src/transformers/models/modernbert/modeling_modernbert.py

…cleaner code

github-actions · 2025-11-18T17:28:24Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, ernie4_5_moe, ernie4_5_vl

vasqu added 27 commits July 22, 2025 12:06

init

339a89c

lets tmp disable cache init

eb9d6b4

some initial remote code version, for local inference use remote proc…

4260a62

…essor for now

first cleanups

26f06a2

need to do this slowly

b3d999a

more attention cleanup

1e190e2

llama like text attention

b44101d

generates different text but cos and sin tensors are always close - 1e-8

b38e048

another round of rope fixups

fcf3903

yea, gonna check tomorrow cant cheat w freqs for whatever reason

62206ee

NOTE: last time where comp with old rope

7e7d8e4

rope cleanup

fca8fba

more rope

db80573

somewhat clean 3d rope with attn - sin / cos has very small diffs to …

e82297b

…original formula (torch.allclose always True) leading to slightly different generations

new rope type

8540938

style

dfe6714

attempt at moe, gonna need a deeper look

1153291

cleanup gate

39c77ef

more cleaning

aadf423

NOTE remove attempt at moe for now

096529d

another round of cleanups

3820cc6

whoops

b25a458

we back boys, reattempting moe start

04a7882

moe should be done with this

b16737f

cleanup

30acfda

more cleanup

5b6efdd

nits

46efff9

CSWYF3634076 mentioned this pull request Aug 15, 2025

[Model] Add Ernie4.5 VL Model Support vllm-project/vllm#22514

Merged

vasqu added 2 commits August 18, 2025 20:05

add conversion and adjust code accordingly

7303a31

fix

cba549f

github-actions bot requested a review from Rocketknight1 October 30, 2025 18:33

vasqu requested review from zucchini-nlp and removed request for Rocketknight1 October 30, 2025 18:33

torch 2.9 (fa2 untested, video from 2.6)

9a1a27f

molbap self-requested a review November 12, 2025 10:33

zucchini-nlp reviewed Nov 12, 2025

View reviewed changes

molbap reviewed Nov 12, 2025

View reviewed changes

vasqu and others added 6 commits November 12, 2025 18:50

raushan's review (part 1)

0c54a1e

Update docs/source/en/model_doc/ernie4_5_vl.md

db1d948

Co-authored-by: Pablo Montalvo <[email protected]>

Pablo's review

c1119a0

style

3ebfb39

fix device/dtype stuff that is no longer needed

662ba2b

revert vision property rm, necessary for composite sdpa test

f9a717c

molbap reviewed Nov 13, 2025

View reviewed changes

fixup few smaller things + refactor how we load the font entirely (ba…

36c1950

…sed on font name with expected associated file at same repo)

vasqu commented Nov 14, 2025

View reviewed changes

src/transformers/models/modernbert/modeling_modernbert.py Show resolved Hide resolved

vasqu added 11 commits November 14, 2025 21:46

remove bc min max pixels --> less modular on processor parts but way …

398314d

…cleaner code

fix fps and add fixme to the inefficient conversion stuff

b70eb4e

rope

1372646

style

1f72e77

copies and last rope stuff i fogot

aaa5254

revert glm4v copies

91bdbb0

fix

1613601

simplify temporal slicing and add more descriptions

91ba990

that ":" 😢

ea569f1

Merge branch 'main' into ernie_vl

7c5af18

Merge branch 'main' into ernie_vl

285547a

fixup init

6c0d473


		## Overview

		The ernie4_5_vl model was proposed in [<INSERT PAPER NAME HERE>](<INSERT PAPER LINK HERE>) by <INSERT AUTHORS HERE>.

[Ernie 4.5] Ernie VL models #39585

Are you sure you want to change the base?

[Ernie 4.5] Ernie VL models #39585

Uh oh!

Conversation

vasqu commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vasqu commented Oct 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

molbap left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

molbap Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

vasqu Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

molbap left a comment

Choose a reason for hiding this comment

Uh oh!

molbap Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

vasqu Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[`Ernie 4.5`] Ernie VL models #39585

[`Ernie 4.5`] Ernie VL models #39585

vasqu commented Jul 22, 2025 •

edited

Loading

vasqu Nov 12, 2025 •

edited

Loading