Add scores for LexRetrieval.v1 by KennethEnevoldsen · Pull Request #499 · embeddings-benchmark/results

KennethEnevoldsen · 2026-04-29T16:34:56Z

Seems like we might need to refactor the task a bit, models are already getting >90 and have only run a few

Checklist

My model has a model sheet, report, or similar
My model has a reference implementation in mteb/models/model_implementations/, this can be as an API. Instruction on how to add a model can be found here
- No, but there is an existing PR ___
The results submitted are obtained using the reference implementation
My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
I solemnly swear that for all results submitted I have not trained on the evaluation dataset including training splits. If I have, I have disclosed it clearly.

Seems like we might need to refactor the task a bit, models are already getting >90 and have only run a few

github-actions · 2026-04-29T16:46:44Z

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: BAAI/bge-m3, Cohere/Cohere-embed-v4.0, google/embeddinggemma-300m, intfloat/multilingual-e5-base, intfloat/multilingual-e5-large-instruct, intfloat/multilingual-e5-large, intfloat/multilingual-e5-small, jinaai/jina-colbert-v2, jinaai/jina-embeddings-v5-text-nano, minishlab/potion-multilingual-128M, mteb/baseline-bm25s, mteb/baseline-random-encoder, openai/text-embedding-3-large, openai/text-embedding-3-small, sentence-transformers/LaBSE, sentence-transformers/all-MiniLM-L12-v2, sentence-transformers/all-MiniLM-L6-v2, sentence-transformers/paraphrase-multilingual-mpnet-base-v2, sentence-transformers/static-similarity-mrl-multilingual-v1
Tasks: LexRetrieval.v1

Results for `BAAI/bge-m3`

task_name	BAAI/bge-m3	intfloat/multilingual-e5-large	Max result	Model with max result	In Training Data
LexRetrieval.v1	0.9559	0.9293			False
Average	0.9559	0.9293		nan	-

Training datasets: CMedQAv1-reranking, CMedQAv2-reranking, CmedqaRetrieval, CodeSearchNet, DuRetrieval, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, LeCaRDv2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MMarcoReranking, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, T2Reranking, T2Retrieval, mMARCO-NL

Results for `Cohere/Cohere-embed-v4.0`

task_name	Cohere/Cohere-embed-v4.0	intfloat/multilingual-e5-large	Max result	Model with max result	In Training Data
LexRetrieval.v1	0.9544	0.9293			False
Average	0.9544	0.9293		nan	-

Results for `google/embeddinggemma-300m`

task_name	google/embeddinggemma-300m	intfloat/multilingual-e5-large	Max result	Model with max result	In Training Data
LexRetrieval.v1	0.2618	0.9293			False
Average	0.2618	0.9293		nan	-

Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoNQ-VN, NanoNQRetrieval

Results for `intfloat/multilingual-e5-base`

task_name	intfloat/multilingual-e5-base	intfloat/multilingual-e5-large	Max result	Model with max result	In Training Data
LexRetrieval.v1	0.8862	0.9293			False
Average	0.8862	0.9293		nan	-

Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-PL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, XQuADRetrieval, mMARCO-NL

Results for `intfloat/multilingual-e5-large-instruct`

task_name	intfloat/multilingual-e5-large	intfloat/multilingual-e5-large-instruct	Max result	Model with max result	In Training Data
LexRetrieval.v1	0.9293	0.9471			False
Average	0.9293	0.9471		nan	-

Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-PL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, XQuADRetrieval, mMARCO-NL

Results for `intfloat/multilingual-e5-large`

task_name	intfloat/multilingual-e5-large	Max result	Model with max result	In Training Data
LexRetrieval.v1	0.9293			False
Average	0.9293		nan	-

Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-PL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, XQuADRetrieval, mMARCO-NL

Results for `intfloat/multilingual-e5-small`

task_name	intfloat/multilingual-e5-large	intfloat/multilingual-e5-small	Max result	Model with max result	In Training Data
LexRetrieval.v1	0.9293	0.8424			False
Average	0.9293	0.8424		nan	-

Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-PL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, XQuADRetrieval, mMARCO-NL

Results for `jinaai/jina-colbert-v2`

task_name	intfloat/multilingual-e5-large	jinaai/jina-colbert-v2	Max result	Model with max result	In Training Data
LexRetrieval.v1	0.9293	0.8656			False
Average	0.9293	0.8656		nan	-

Training datasets: DuRetrieval, MIRACL, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, NanoMSMARCO-VN, NanoMSMARCORetrieval, mMARCO-NL

Results for `jinaai/jina-embeddings-v5-text-nano`

task_name	intfloat/multilingual-e5-large	jinaai/jina-embeddings-v5-text-nano	Max result	Model with max result	In Training Data
LexRetrieval.v1	0.9293	0.9611			False
Average	0.9293	0.9611		nan	-

Results for `minishlab/potion-multilingual-128M`

task_name	intfloat/multilingual-e5-large	minishlab/potion-multilingual-128M	Max result	Model with max result	In Training Data
LexRetrieval.v1	0.9293	0.4571			False
Average	0.9293	0.4571		nan	-

Training datasets: AmazonReviewsClassification, AmazonReviewsVNClassification, CMedQAv1-reranking, CMedQAv2-reranking, CmedqaRetrieval, CodeSearchNet, DuRetrieval, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, LeCaRDv2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MLQARetrieval, MMarcoReranking, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, T2Reranking, T2Retrieval, mMARCO-NL

Results for `mteb/baseline-bm25s`

task_name	intfloat/multilingual-e5-large	mteb/baseline-bm25s	Max result	Model with max result	In Training Data
LexRetrieval.v1	0.9293	0.8983			False
Average	0.9293	0.8983		nan	-

Results for `mteb/baseline-random-encoder`

task_name	intfloat/multilingual-e5-large	mteb/baseline-random-encoder	Max result	Model with max result	In Training Data
LexRetrieval.v1	0.9293	0.0005			False
Average	0.9293	0.0005		nan	-

Results for `openai/text-embedding-3-large`

task_name	intfloat/multilingual-e5-large	openai/text-embedding-3-large	Max result	Model with max result	In Training Data
LexRetrieval.v1	0.9293	0.9605			False
Average	0.9293	0.9605		nan	-

Results for `openai/text-embedding-3-small`

task_name	intfloat/multilingual-e5-large	openai/text-embedding-3-small	Max result	Model with max result	In Training Data
LexRetrieval.v1	0.9293	0.9384			False
Average	0.9293	0.9384		nan	-

Results for `sentence-transformers/LaBSE`

task_name	intfloat/multilingual-e5-large	sentence-transformers/LaBSE	Max result	Model with max result	In Training Data
LexRetrieval.v1	0.9293	0.5814			False
Average	0.9293	0.5814		nan	-

Results for `sentence-transformers/all-MiniLM-L12-v2`

task_name	intfloat/multilingual-e5-large	sentence-transformers/all-MiniLM-L12-v2	Max result	Model with max result	In Training Data
LexRetrieval.v1	0.9293	0.4286			False
Average	0.9293	0.4286		nan	-

Training datasets: MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, mMARCO-NL

Results for `sentence-transformers/all-MiniLM-L6-v2`

task_name	intfloat/multilingual-e5-large	sentence-transformers/all-MiniLM-L6-v2	Max result	Model with max result	In Training Data
LexRetrieval.v1	0.9293	0.2791			False
Average	0.9293	0.2791		nan	-

Training datasets: MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, mMARCO-NL

Results for `sentence-transformers/paraphrase-multilingual-mpnet-base-v2`

task_name	intfloat/multilingual-e5-large	sentence-transformers/paraphrase-multilingual-mpnet-base-v2	Max result	Model with max result	In Training Data
LexRetrieval.v1	0.9293	0.7342			False
Average	0.9293	0.7342		nan	-

Training datasets: MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, mMARCO-NL

Results for `sentence-transformers/static-similarity-mrl-multilingual-v1`

task_name	intfloat/multilingual-e5-large	sentence-transformers/static-similarity-mrl-multilingual-v1	Max result	Model with max result	In Training Data
LexRetrieval.v1	0.9293	0.5136			False
Average	0.9293	0.5136		nan	-

Training datasets: NanoQuoraRetrieval, Quora-NL, Quora-PL, Quora-PLHardNegatives, QuoraRetrieval, QuoraRetrieval-Fa, QuoraRetrieval-Fa.v2, QuoraRetrievalHardNegatives, QuoraRetrievalHardNegatives.v2, StackExchangeClustering, StackExchangeClustering-VN, StackExchangeClustering.v2, StackExchangeClusteringP2P, StackExchangeClusteringP2P-VN, StackExchangeClusteringP2P.v2, TatoebaBitextMining

Note: Content truncated due to GitHub API limits. See the full report in the workflow artifacts.

KennethEnevoldsen · 2026-04-29T19:35:42Z

Will postpone submitting these results as there seems to be a cealing effect.

github-actions · 2026-05-14T02:23:15Z

This pull request has been automatically marked as stale due to inactivity.

github-actions · 2026-05-21T02:25:00Z

This pull request has been automatically closed due to inactivity.

KennethEnevoldsen added 2 commits April 29, 2026 18:34

Add scores for LexRetrieval.v1

2e21db3

Seems like we might need to refactor the task a bit, models are already getting >90 and have only run a few

add text emb 3 large

d05837a

Kenneth and others added 7 commits April 29, 2026 17:08

add more res

796eb08

add sentence trf models

2461d75

add results

24a0d17

add jina nano

d044a7e

add jina colbert

42d612e

add bge

3ae5613

add cohere results

fe6daa9

KennethEnevoldsen marked this pull request as ready for review April 29, 2026 19:34

KennethEnevoldsen requested review from Samoed and removed request for Samoed April 29, 2026 19:34

KennethEnevoldsen marked this pull request as draft April 29, 2026 19:35

github-actions Bot added the stale label May 14, 2026

github-actions Bot closed this May 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add scores for LexRetrieval.v1#499

Add scores for LexRetrieval.v1#499
KennethEnevoldsen wants to merge 9 commits into
mainfrom
lex-retrieval

KennethEnevoldsen commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026 •

edited

Loading

Uh oh!

KennethEnevoldsen commented Apr 29, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KennethEnevoldsen commented Apr 29, 2026

Checklist

Uh oh!

github-actions Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Model Results Comparison

Results for BAAI/bge-m3

Results for Cohere/Cohere-embed-v4.0

Results for google/embeddinggemma-300m

Results for intfloat/multilingual-e5-base

Results for intfloat/multilingual-e5-large-instruct

Results for intfloat/multilingual-e5-large

Results for intfloat/multilingual-e5-small

Results for jinaai/jina-colbert-v2

Results for jinaai/jina-embeddings-v5-text-nano

Results for minishlab/potion-multilingual-128M

Results for mteb/baseline-bm25s

Results for mteb/baseline-random-encoder

Results for openai/text-embedding-3-large

Results for openai/text-embedding-3-small

Results for sentence-transformers/LaBSE

Results for sentence-transformers/all-MiniLM-L12-v2

Results for sentence-transformers/all-MiniLM-L6-v2

Results for sentence-transformers/paraphrase-multilingual-mpnet-base-v2

Results for sentence-transformers/static-similarity-mrl-multilingual-v1

Uh oh!

KennethEnevoldsen commented Apr 29, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Apr 29, 2026 •

edited

Loading

Results for `BAAI/bge-m3`

Results for `Cohere/Cohere-embed-v4.0`

Results for `google/embeddinggemma-300m`

Results for `intfloat/multilingual-e5-base`

Results for `intfloat/multilingual-e5-large-instruct`

Results for `intfloat/multilingual-e5-large`

Results for `intfloat/multilingual-e5-small`

Results for `jinaai/jina-colbert-v2`

Results for `jinaai/jina-embeddings-v5-text-nano`

Results for `minishlab/potion-multilingual-128M`

Results for `mteb/baseline-bm25s`

Results for `mteb/baseline-random-encoder`

Results for `openai/text-embedding-3-large`

Results for `openai/text-embedding-3-small`

Results for `sentence-transformers/LaBSE`

Results for `sentence-transformers/all-MiniLM-L12-v2`

Results for `sentence-transformers/all-MiniLM-L6-v2`

Results for `sentence-transformers/paraphrase-multilingual-mpnet-base-v2`

Results for `sentence-transformers/static-similarity-mrl-multilingual-v1`