Skip to content

Conversation

@mengweiguo
Copy link
Contributor

@mengweiguo mengweiguo commented Dec 1, 2025

Description

The benefits handled by prefill-chunk in NPUW:

  • Support long context
  • Performance improvement

Note:

  1. NPUW PR: [NPUW] Support prefill-chunk for text-embedding model openvino#33076
  2. The tests has been verified to work with both NPUW and GenAI updates.

CVS-177453

Checklist:

  • Tests have been updated or added to cover the new code.
  • This patch fully addresses the ticket.
  • I have made corresponding changes to the documentation.

Copilot AI review requested due to automatic review settings December 1, 2025 07:21
@github-actions github-actions bot added category: llm_bench Label for tool/llm_bench folder no-match-files category: RAG RAG pipeline components labels Dec 1, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for NPUW (Neural Processing Unit Workload) optimization for text embedding models, enabling long context support and performance improvements through prefill-chunk handling.

Key changes:

  • Added NPU-specific compilation path for text embedding models with dynamic shapes
  • Introduced new configuration parameter emb_pad_to_max_length to control padding behavior
  • Refactored NPU compilation logic to support both LLM and text embedding model types

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tools/llm_bench/llm_bench_utils/ov_utils.py Fixed parameter name and moved padding configuration to support new padding control
tools/llm_bench/llm_bench_utils/model_utils.py Added mapping for new emb_pad_to_max_length parameter
tools/llm_bench/benchmark.py Added command-line argument for embedding padding control
src/cpp/src/utils.hpp Declared new function for NPU text embedding compilation
src/cpp/src/utils.cpp Refactored NPU compilation logic and added text embedding-specific configuration
src/cpp/src/rag/text_embedding_pipeline.cpp Implemented NPU compilation path for dynamic text embedding models

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mengweiguo mengweiguo force-pushed the qwen3-embedding-pr branch 2 times, most recently from 1b7e667 to 3120bdf Compare December 2, 2025 03:16
Copilot AI review requested due to automatic review settings December 2, 2025 03:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings December 2, 2025 10:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mengweiguo mengweiguo changed the title Support NPUW for text-embedding models [NPU] Support NPUW for text-embedding models Dec 3, 2025
@mengweiguo mengweiguo marked this pull request as ready for review December 3, 2025 05:47
Copilot AI review requested due to automatic review settings December 3, 2025 05:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Wovchena Wovchena requested a review from as-suvorov December 3, 2025 07:20
is_fixed_size = false;
}

bool is_padding_on_left = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_padding_on_left used in NPU branch only, let's move it under NPU if


model = apply_postprocessing(model, m_config);

bool is_fixed_size = true;
Copy link
Collaborator

@as-suvorov as-suvorov Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool is_fixed_size = true;
bool is_seq_len_fixed = true;

local_config["MAX_PROMPT_LEN"] = m_config.max_length.value();
}
std::tie(compiled_model, kv_desc) =
utils::compile_decoder_for_npu_text_embedding(model, properties, kv_pos, local_config);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move NPU specific properties processing into compile_decoder_for_npu_text_embedding.

Suggested change
utils::compile_decoder_for_npu_text_embedding(model, properties, kv_pos, local_config);
utils::compile_decoder_for_npu_text_embedding(model, properties, kv_pos, m_config);

or

Suggested change
utils::compile_decoder_for_npu_text_embedding(model, properties, kv_pos, local_config);
utils::compile_decoder_for_npu_text_embedding(model, properties, kv_pos, m_config.pooling_type, m_config.max_length);

namespace genai {
namespace utils {

enum class ModelType { Standard, Whisper, TextEmbedding };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick, not clear what standard model type is

Suggested change
enum class ModelType { Standard, Whisper, TextEmbedding };
enum class ModelType { Default, Whisper, TextEmbedding };

@as-suvorov
Copy link
Collaborator

@mengweiguo You applied formatting for updated files. We have several PRs to enable formatting. I'm a bit concerned if your changes would match our fromatting rules.
If you want to apply formatting for updated files please align your PR with #3008 pre commit rules. Otherwise please revert formatting changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: llm_bench Label for tool/llm_bench folder category: RAG RAG pipeline components no-match-files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants