-
Notifications
You must be signed in to change notification settings - Fork 31.2k
[Ernie 4.5] Ernie VL models
#39585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Ernie 4.5] Ernie VL models
#39585
Conversation
…original formula (torch.allclose always True) leading to slightly different generations
|
There are some TODOs left, but the core parts are finished so a general (first) review would be nice |
src/transformers/models/ernie4_5_vl/configuration_ernie4_5_vl.py
Outdated
Show resolved
Hide resolved
src/transformers/models/ernie4_5_vl/configuration_ernie4_5_vl.py
Outdated
Show resolved
Hide resolved
src/transformers/models/ernie4_5_vl/video_processing_ernie4_5_vl.py
Outdated
Show resolved
Hide resolved
src/transformers/models/ernie4_5_vl/video_processing_ernie4_5_vl.py
Outdated
Show resolved
Hide resolved
src/transformers/models/ernie4_5_vl/video_processing_ernie4_5_vl.py
Outdated
Show resolved
Hide resolved
molbap
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the huge work! Left a review after reading that of @zucchini-nlp
|
|
||
| ## Overview | ||
|
|
||
| The ernie4_5_vl model was proposed in [<INSERT PAPER NAME HERE>](<INSERT PAPER LINK HERE>) by <INSERT AUTHORS HERE>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To complete with https://arxiv.org/abs/2510.14528 I suppose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea keeping one comment open here, I have a TODO note in the PR description 👍
src/transformers/models/ernie4_5_vl/video_processing_ernie4_5_vl.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Pablo Montalvo <[email protected]>
molbap
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
github has been a bit peculiar, been struggling with the new interface. Left very few comments, most of it is + what was already mentioned by Raushan is doable now that the weight loader #41580 is merged. We can either merge now and use the converter later (by its nature, the order does not matter, we won't break BC), or merge later once the changes from converter are merged here, however you prefer
| # convert_weights(args.checkpoint_path, args.pytorch_dump_folder_path) | ||
| convert_config(args.checkpoint_path, args.pytorch_dump_folder_path) | ||
|
|
||
| # if args.convert_preprocessor: | ||
| # convert_processor(args.checkpoint_path, args.pytorch_dump_folder_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just cleanup
| # convert_weights(args.checkpoint_path, args.pytorch_dump_folder_path) | |
| convert_config(args.checkpoint_path, args.pytorch_dump_folder_path) | |
| # if args.convert_preprocessor: | |
| # convert_processor(args.checkpoint_path, args.pytorch_dump_folder_path) | |
| convert_config(args.checkpoint_path, args.pytorch_dump_folder_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The processor will also stay but yea, playing around the conversion too much and I'm too lazy so modifying what I need at certain times 😄 will keep this open tho, not to forget
src/transformers/models/ernie4_5_vl/video_processing_ernie4_5_vl.py
Outdated
Show resolved
Hide resolved
src/transformers/models/ernie4_5_vl/video_processing_ernie4_5_vl.py
Outdated
Show resolved
Hide resolved
src/transformers/models/ernie4_5_vl/video_processing_ernie4_5_vl.py
Outdated
Show resolved
Hide resolved
src/transformers/models/ernie4_5_vl/video_processing_ernie4_5_vl.py
Outdated
Show resolved
Hide resolved
src/transformers/models/ernie4_5_vl/video_processing_ernie4_5_vl.py
Outdated
Show resolved
Hide resolved
…sed on font name with expected associated file at same repo)
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, ernie4_5_moe, ernie4_5_vl |
Continuation of #39228 for the VL models
Current inference script for testing (torch 2.6):
Output:
The image features a person sitting on a hilltop, gazing out at a vast mountain range. The person is wrapped in a colorful, striped blanket, and their head is covered with a red headscarf. The foreground includes vibrant pink flowers, adding a pop of color to the scene. The background showLeft TODOs: