-
Notifications
You must be signed in to change notification settings - Fork 303
[NPU] Support NPUW for text-embedding models #3088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for NPUW (Neural Processing Unit Workload) optimization for text embedding models, enabling long context support and performance improvements through prefill-chunk handling.
Key changes:
- Added NPU-specific compilation path for text embedding models with dynamic shapes
- Introduced new configuration parameter
emb_pad_to_max_lengthto control padding behavior - Refactored NPU compilation logic to support both LLM and text embedding model types
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/llm_bench/llm_bench_utils/ov_utils.py | Fixed parameter name and moved padding configuration to support new padding control |
| tools/llm_bench/llm_bench_utils/model_utils.py | Added mapping for new emb_pad_to_max_length parameter |
| tools/llm_bench/benchmark.py | Added command-line argument for embedding padding control |
| src/cpp/src/utils.hpp | Declared new function for NPU text embedding compilation |
| src/cpp/src/utils.cpp | Refactored NPU compilation logic and added text embedding-specific configuration |
| src/cpp/src/rag/text_embedding_pipeline.cpp | Implemented NPU compilation path for dynamic text embedding models |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
d1b6ce1 to
94495b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
1b7e667 to
3120bdf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
3120bdf to
f713558
Compare
f713558 to
a591398
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| is_fixed_size = false; | ||
| } | ||
|
|
||
| bool is_padding_on_left = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_padding_on_left used in NPU branch only, let's move it under NPU if
|
|
||
| model = apply_postprocessing(model, m_config); | ||
|
|
||
| bool is_fixed_size = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| bool is_fixed_size = true; | |
| bool is_seq_len_fixed = true; |
| local_config["MAX_PROMPT_LEN"] = m_config.max_length.value(); | ||
| } | ||
| std::tie(compiled_model, kv_desc) = | ||
| utils::compile_decoder_for_npu_text_embedding(model, properties, kv_pos, local_config); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move NPU specific properties processing into compile_decoder_for_npu_text_embedding.
| utils::compile_decoder_for_npu_text_embedding(model, properties, kv_pos, local_config); | |
| utils::compile_decoder_for_npu_text_embedding(model, properties, kv_pos, m_config); |
or
| utils::compile_decoder_for_npu_text_embedding(model, properties, kv_pos, local_config); | |
| utils::compile_decoder_for_npu_text_embedding(model, properties, kv_pos, m_config.pooling_type, m_config.max_length); |
| namespace genai { | ||
| namespace utils { | ||
|
|
||
| enum class ModelType { Standard, Whisper, TextEmbedding }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick, not clear what standard model type is
| enum class ModelType { Standard, Whisper, TextEmbedding }; | |
| enum class ModelType { Default, Whisper, TextEmbedding }; |
|
@mengweiguo You applied formatting for updated files. We have several PRs to enable formatting. I'm a bit concerned if your changes would match our fromatting rules. |
Description
The benefits handled by prefill-chunk in NPUW:
Note:
CVS-177453
Checklist: