[NPUW] Support prefill-chunk for text-embedding model #33076

mengweiguo · 2025-12-01T07:16:36Z

Details:

Qwen3-text-embedding is a transformer-based casual model and it's not the traditional LLM and is not directly adapted to NPUW.
The benefits of prefill-chunk for Qwen3-text-embedding:

support long context
Performance improvement

Changes:

Added KVCache nodes in model and updated shapes for related nodes.
Added positon_ids input node since it's hardcoded in original model.
Created a separate model to handle the post-processing.
Cached the output of prefill since mean post-processing needs entire output data.

Notes:

Though kvcache model is not needed at all, it's still there. As I don't want to add many if-else. And the penalty is the compilation time increasing.
Padding is only supported in the mean post-processing mode for now, which makes thing simple. I can add the padding support on left in following PRs if required.
GenAI PR: [NPU] Support NPUW for text-embedding models openvino.genai#3088
The tests has been verified to work with both NPUW and GenAI updates.

Tickets:

CVS-177453

github-actions bot added category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Dec 1, 2025

sys-openvino-ci added the ExternalIntelPR External contributor from Intel label Dec 1, 2025

mengweiguo mentioned this pull request Dec 1, 2025

[NPU] Support NPUW for text-embedding models openvinotoolkit/openvino.genai#3088

Open

3 tasks

Support prefill-chunk for text-embedding model

87fd20e

mengweiguo force-pushed the qwen3-embedding-pr branch from 21b144c to 87fd20e Compare December 2, 2025 10:10

code cleanup

427753f

mengweiguo changed the title ~~Support prefill-chunk for text-embedding model~~ NPUW] Support prefill-chunk for text-embedding model Dec 3, 2025

mengweiguo changed the title ~~NPUW] Support prefill-chunk for text-embedding model~~ [NPUW] Support prefill-chunk for text-embedding model Dec 3, 2025

mengweiguo marked this pull request as ready for review December 3, 2025 05:47

mengweiguo requested review from a team as code owners December 3, 2025 05:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NPUW] Support prefill-chunk for text-embedding model #33076

[NPUW] Support prefill-chunk for text-embedding model #33076

mengweiguo commented Dec 1, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[NPUW] Support prefill-chunk for text-embedding model #33076

Are you sure you want to change the base?

[NPUW] Support prefill-chunk for text-embedding model #33076

Conversation

mengweiguo commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details:

Tickets:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mengweiguo commented Dec 1, 2025 •

edited

Loading