[NPU] Support NPUW for text-embedding models #3088

mengweiguo · 2025-12-01T07:21:20Z

Description

The benefits handled by prefill-chunk in NPUW:

Support long context
Performance improvement

Note:

NPUW PR: [NPUW] Support prefill-chunk for text-embedding model openvino#33076
The tests has been verified to work with both NPUW and GenAI updates.

CVS-177453

Checklist:

Tests have been updated or added to cover the new code.
This patch fully addresses the ticket.
I have made corresponding changes to the documentation.

Copilot

Pull request overview

This PR adds support for NPUW (Neural Processing Unit Workload) optimization for text embedding models, enabling long context support and performance improvements through prefill-chunk handling.

Key changes:

Added NPU-specific compilation path for text embedding models with dynamic shapes
Introduced new configuration parameter emb_pad_to_max_length to control padding behavior
Refactored NPU compilation logic to support both LLM and text embedding model types

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tools/llm_bench/llm_bench_utils/ov_utils.py	Fixed parameter name and moved padding configuration to support new padding control
tools/llm_bench/llm_bench_utils/model_utils.py	Added mapping for new `emb_pad_to_max_length` parameter
tools/llm_bench/benchmark.py	Added command-line argument for embedding padding control
src/cpp/src/utils.hpp	Declared new function for NPU text embedding compilation
src/cpp/src/utils.cpp	Refactored NPU compilation logic and added text embedding-specific configuration
src/cpp/src/rag/text_embedding_pipeline.cpp	Implemented NPU compilation path for dynamic text embedding models

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/cpp/src/rag/text_embedding_pipeline.cpp

src/cpp/src/utils.cpp

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/cpp/src/utils.cpp

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/cpp/src/utils.cpp

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/cpp/src/rag/text_embedding_pipeline.cpp

src/cpp/src/utils.cpp

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

as-suvorov · 2025-12-03T09:06:14Z

src/cpp/src/rag/text_embedding_pipeline.cpp

+            is_fixed_size = false;
        }

+        bool is_padding_on_left = false;


is_padding_on_left used in NPU branch only, let's move it under NPU if

as-suvorov · 2025-12-03T09:14:26Z

src/cpp/src/rag/text_embedding_pipeline.cpp

-
-        model = apply_postprocessing(model, m_config);
-
+        bool is_fixed_size = true;


Suggested change

bool is_fixed_size = true;

bool is_seq_len_fixed = true;

as-suvorov · 2025-12-03T09:23:53Z

src/cpp/src/rag/text_embedding_pipeline.cpp

+                local_config["MAX_PROMPT_LEN"] = m_config.max_length.value();
+            }
+            std::tie(compiled_model, kv_desc) =
+                utils::compile_decoder_for_npu_text_embedding(model, properties, kv_pos, local_config);


Let's move NPU specific properties processing into compile_decoder_for_npu_text_embedding.

Suggested change

utils::compile_decoder_for_npu_text_embedding(model, properties, kv_pos, local_config);

utils::compile_decoder_for_npu_text_embedding(model, properties, kv_pos, m_config);

or

Suggested change

utils::compile_decoder_for_npu_text_embedding(model, properties, kv_pos, local_config);

utils::compile_decoder_for_npu_text_embedding(model, properties, kv_pos, m_config.pooling_type, m_config.max_length);

as-suvorov · 2025-12-03T09:26:02Z

src/cpp/src/utils.cpp

 namespace genai {
 namespace utils {

+enum class ModelType { Standard, Whisper, TextEmbedding };


nitpick, not clear what standard model type is

Suggested change

enum class ModelType { Standard, Whisper, TextEmbedding };

enum class ModelType { Default, Whisper, TextEmbedding };

as-suvorov · 2025-12-03T09:36:53Z

@mengweiguo You applied formatting for updated files. We have several PRs to enable formatting. I'm a bit concerned if your changes would match our fromatting rules.
If you want to apply formatting for updated files please align your PR with #3008 pre commit rules. Otherwise please revert formatting changes.

Copilot AI review requested due to automatic review settings December 1, 2025 07:21

github-actions bot added category: llm_bench Label for tool/llm_bench folder no-match-files category: RAG RAG pipeline components labels Dec 1, 2025

Copilot AI reviewed Dec 1, 2025

View reviewed changes

src/cpp/src/rag/text_embedding_pipeline.cpp Outdated Show resolved Hide resolved

src/cpp/src/rag/text_embedding_pipeline.cpp Outdated Show resolved Hide resolved

src/cpp/src/utils.cpp Outdated Show resolved Hide resolved

mengweiguo mentioned this pull request Dec 1, 2025

[NPUW] Support prefill-chunk for text-embedding model openvinotoolkit/openvino#33076

Open

mengweiguo force-pushed the qwen3-embedding-pr branch from d1b6ce1 to 94495b4 Compare December 1, 2025 16:50

Copilot AI review requested due to automatic review settings December 2, 2025 02:50

Copilot AI reviewed Dec 2, 2025

View reviewed changes

src/cpp/src/utils.cpp Outdated Show resolved Hide resolved

src/cpp/src/utils.cpp Outdated Show resolved Hide resolved

mengweiguo force-pushed the qwen3-embedding-pr branch 2 times, most recently from 1b7e667 to 3120bdf Compare December 2, 2025 03:16

Copilot AI review requested due to automatic review settings December 2, 2025 03:16

Copilot AI reviewed Dec 2, 2025

View reviewed changes

src/cpp/src/utils.cpp Outdated Show resolved Hide resolved

mengweiguo force-pushed the qwen3-embedding-pr branch from 3120bdf to f713558 Compare December 2, 2025 03:20

Support NPUW for text-embedding models

a591398

mengweiguo force-pushed the qwen3-embedding-pr branch from f713558 to a591398 Compare December 2, 2025 10:14

Copilot AI review requested due to automatic review settings December 2, 2025 10:14

Copilot AI reviewed Dec 2, 2025

View reviewed changes

src/cpp/src/rag/text_embedding_pipeline.cpp Outdated Show resolved Hide resolved

src/cpp/src/utils.cpp Show resolved Hide resolved

Code cleanup

628bc9e

mengweiguo changed the title ~~Support NPUW for text-embedding models~~ [NPU] Support NPUW for text-embedding models Dec 3, 2025

mengweiguo marked this pull request as ready for review December 3, 2025 05:47

Copilot AI review requested due to automatic review settings December 3, 2025 05:47

Copilot AI reviewed Dec 3, 2025

View reviewed changes

Wovchena requested a review from as-suvorov December 3, 2025 07:20

as-suvorov requested changes Dec 3, 2025

View reviewed changes


		model = apply_postprocessing(model, m_config);

		bool is_fixed_size = true;

	utils::compile_decoder_for_npu_text_embedding(model, properties, kv_pos, local_config);
	utils::compile_decoder_for_npu_text_embedding(model, properties, kv_pos, m_config);

	enum class ModelType { Standard, Whisper, TextEmbedding };
	enum class ModelType { Default, Whisper, TextEmbedding };

[NPU] Support NPUW for text-embedding models #3088

Are you sure you want to change the base?

[NPU] Support NPUW for text-embedding models #3088

Conversation

mengweiguo commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

as-suvorov Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

as-suvorov Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

as-suvorov Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

as-suvorov Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

as-suvorov commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mengweiguo commented Dec 1, 2025 •

edited

Loading

as-suvorov Dec 3, 2025 •

edited

Loading