Skip to content

Conversation

@vasqu
Copy link
Contributor

@vasqu vasqu commented Jul 22, 2025

Continuation of #39228 for the VL models

Current inference script for testing (torch 2.6):

import requests
from PIL import Image

from transformers import AutoModelForImageTextToText, AutoProcessor


use_fast = False
model_path = "<your_converted_path>"

processor_kwargs = {} if not use_fast else {"use_fast": True}
processor = AutoProcessor.from_pretrained(model_path, **processor_kwargs)

model = AutoModelForImageTextToText.from_pretrained(
    model_path,
    device_map="auto",
    dtype="auto",
)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Only use English during your responses and describe the following image."},
            {"type": "image"},
        ]
    },
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image = Image.open(requests.get("https://paddlenlp.bj.bcebos.com/datasets/paddlemix/demo_images/example1.jpg", stream=True).raw)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **inputs,
    max_new_tokens=64,
    do_sample=False,
)
print(processor.decode(generated_ids[0][len(inputs['input_ids'][0]):]))

Output:
The image features a person sitting on a hilltop, gazing out at a vast mountain range. The person is wrapped in a colorful, striped blanket, and their head is covered with a red headscarf. The foreground includes vibrant pink flowers, adding a pop of color to the scene. The background show

Left TODOs:

  • Integration tests
    • Test with fast processor
    • Check FA2
  • TP + EP?
  • Docs
  • Changed weight loading when weight converter lands in main

@vasqu vasqu requested review from zucchini-nlp and removed request for Rocketknight1 October 30, 2025 18:33
@vasqu
Copy link
Contributor Author

vasqu commented Oct 30, 2025

There are some TODOs left, but the core parts are finished so a general (first) review would be nice

@molbap molbap self-requested a review November 12, 2025 10:33
Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the huge work! Left a review after reading that of @zucchini-nlp


## Overview

The ernie4_5_vl model was proposed in [<INSERT PAPER NAME HERE>](<INSERT PAPER LINK HERE>) by <INSERT AUTHORS HERE>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To complete with https://arxiv.org/abs/2510.14528 I suppose

Copy link
Contributor Author

@vasqu vasqu Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea keeping one comment open here, I have a TODO note in the PR description 👍

Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

github has been a bit peculiar, been struggling with the new interface. Left very few comments, most of it is + what was already mentioned by Raushan is doable now that the weight loader #41580 is merged. We can either merge now and use the converter later (by its nature, the order does not matter, we won't break BC), or merge later once the changes from converter are merged here, however you prefer

Comment on lines +454 to +458
# convert_weights(args.checkpoint_path, args.pytorch_dump_folder_path)
convert_config(args.checkpoint_path, args.pytorch_dump_folder_path)

# if args.convert_preprocessor:
# convert_processor(args.checkpoint_path, args.pytorch_dump_folder_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just cleanup

Suggested change
# convert_weights(args.checkpoint_path, args.pytorch_dump_folder_path)
convert_config(args.checkpoint_path, args.pytorch_dump_folder_path)
# if args.convert_preprocessor:
# convert_processor(args.checkpoint_path, args.pytorch_dump_folder_path)
convert_config(args.checkpoint_path, args.pytorch_dump_folder_path)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The processor will also stay but yea, playing around the conversion too much and I'm too lazy so modifying what I need at certain times 😄 will keep this open tho, not to forget

…sed on font name with expected associated file at same repo)
@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, ernie4_5_moe, ernie4_5_vl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants