Skip to content

model: jina-embeddings-v5-omni models#4604

Merged
Samoed merged 7 commits into
embeddings-benchmark:mainfrom
florian-hoenicke:add-jina-v5-omni
May 10, 2026
Merged

model: jina-embeddings-v5-omni models#4604
Samoed merged 7 commits into
embeddings-benchmark:mainfrom
florian-hoenicke:add-jina-v5-omni

Conversation

@florian-hoenicke
Copy link
Copy Markdown
Contributor

@florian-hoenicke florian-hoenicke commented May 4, 2026

Hi,

This PR adds the jina-embeddings-v5-omni nano and small base models to MTEB. The text path is parity-verified against the corresponding v5 text models, so the same task routing is used here.

Thanks!

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread mteb/models/model_implementations/jina_models.py
@Samoed Samoed added the new model Questions related to adding a new model to the benchmark label May 4, 2026
Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread tests/test_models/test_model_meta.py
@Samoed Samoed changed the title add: jina-embeddings-v5-omni models model: jina-embeddings-v5-omni models May 5, 2026
Co-authored-by: Cursor <cursoragent@cursor.com>
@florian-hoenicke
Copy link
Copy Markdown
Contributor Author

florian-hoenicke commented May 6, 2026

Follow-up fix pushed in 320263b: JinaV5OmniWrapper now defaults jinaai/jina-embeddings-v5-omni-nano to torch.float32, which restores text-path parity with jinaai/jina-embeddings-v5-text-nano. Verified on A2 with private HF auth via mteb.get_model(...): omni-nano loads fp32 and matched text-nano exactly on retrieval/query and classification/document probes (max_abs_diff=0.0). Small remains unchanged.

Co-authored-by: Cursor <cursoragent@cursor.com>
@florian-hoenicke
Copy link
Copy Markdown
Contributor Author

florian-hoenicke commented May 6, 2026

Thanks, updated the prompt/task dispatch test in c42e7a9 to use MockRetrievalTask().metadata instead of a hand-rolled task metadata object.

Co-authored-by: Cursor <cursoragent@cursor.com>
@florian-hoenicke
Copy link
Copy Markdown
Contributor Author

florian-hoenicke commented May 6, 2026

Follow-up corner-case fix pushed in 9f40546: I found the omni HF remote code was stripping text before tokenization, which made trailing-space inputs differ from the text models. The private HF repos now preserve whitespace and the MTEB metadata pins those fixed revisions. Verified on A2 through mteb.get_model(...): nano 36db6194... and small 8b4f2c44... match their text counterparts on trailing-space probes with max_abs=0.0.

@Samoed
Copy link
Copy Markdown
Member

Samoed commented May 6, 2026

JinaV5OmniWrapper now defaults jinaai/jina-embeddings-v5-omni-nano to torch.float32, which restores text-path parity

You can just pass this in loader_kwargs without changes in __init__

Co-authored-by: Cursor <cursoragent@cursor.com>
@florian-hoenicke
Copy link
Copy Markdown
Contributor Author

Updated again after the latest private HF repo fix: MTEB now pins nano 6f88a89e... and small 43affca6.... Verified on A2 against the text counterparts on trailing-space probes and all unique STSBenchmark strings: max_abs=0.0, min cosine 1.0 / 0.99999988.

Move the nano dtype default into model metadata and remove model-specific tests per reviewer guidance.

Co-authored-by: Cursor <cursoragent@cursor.com>
@florian-hoenicke
Copy link
Copy Markdown
Contributor Author

Update: the referenced HF repos are now public and ungated:

  • jinaai/jina-embeddings-v5-omni-nano: Hub API reports private: false, gated: false, disabled: false
  • jinaai/jina-embeddings-v5-omni-small: Hub API reports private: false, gated: false, disabled: false

I also pushed 605cb14 to address the open review feedback: nano fp32 is now passed through loader_kwargs, and the Jina-specific tests were removed. Local checks passed (test_model_meta.py: 1935 passed; ruff on edited files passed).

@Samoed
Copy link
Copy Markdown
Member

Samoed commented May 9, 2026

I think we would wait until public release

@florian-hoenicke
Copy link
Copy Markdown
Contributor Author

The public release is live now: both referenced model repos are publicly accessible and ungated on the Hub.

Hub API currently reports private: false, gated: false, disabled: false for both. Please let me know if you need any other release artifact before merging.

@Samoed Samoed merged commit 5ff08fa into embeddings-benchmark:main May 10, 2026
13 checks passed
@Samoed Samoed mentioned this pull request May 10, 2026
75 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new model Questions related to adding a new model to the benchmark

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants