Skip to content

Conversation

@mengweiguo
Copy link

@mengweiguo mengweiguo commented Dec 1, 2025

Details:

Qwen3-text-embedding is a transformer-based casual model and it's not the traditional LLM and is not directly adapted to NPUW.
The benefits of prefill-chunk for Qwen3-text-embedding:

  • support long context
  • Performance improvement

Changes:

  • Added KVCache nodes in model and updated shapes for related nodes.
  • Added positon_ids input node since it's hardcoded in original model.
  • Created a separate model to handle the post-processing.
  • Cached the output of prefill since mean post-processing needs entire output data.

Notes:

  1. Though kvcache model is not needed at all, it's still there. As I don't want to add many if-else. And the penalty is the compilation time increasing.
  2. Padding is only supported in the mean post-processing mode for now, which makes thing simple. I can add the padding support on left in following PRs if required.
  3. GenAI PR: [NPU] Support NPUW for text-embedding models openvino.genai#3088
  4. The tests has been verified to work with both NPUW and GenAI updates.

Tickets:

@github-actions github-actions bot added category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Dec 1, 2025
@sys-openvino-ci sys-openvino-ci added the ExternalIntelPR External contributor from Intel label Dec 1, 2025
@mengweiguo mengweiguo changed the title Support prefill-chunk for text-embedding model NPUW] Support prefill-chunk for text-embedding model Dec 3, 2025
@mengweiguo mengweiguo changed the title NPUW] Support prefill-chunk for text-embedding model [NPUW] Support prefill-chunk for text-embedding model Dec 3, 2025
@mengweiguo mengweiguo marked this pull request as ready for review December 3, 2025 05:47
@mengweiguo mengweiguo requested review from a team as code owners December 3, 2025 05:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin ExternalIntelPR External contributor from Intel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants