Skip to content

Add scores for LexRetrieval.v1#499

Closed
KennethEnevoldsen wants to merge 9 commits into
mainfrom
lex-retrieval
Closed

Add scores for LexRetrieval.v1#499
KennethEnevoldsen wants to merge 9 commits into
mainfrom
lex-retrieval

Conversation

@KennethEnevoldsen
Copy link
Copy Markdown
Contributor

Seems like we might need to refactor the task a bit, models are already getting >90 and have only run a few

Checklist

  • My model has a model sheet, report, or similar
  • My model has a reference implementation in mteb/models/model_implementations/, this can be as an API. Instruction on how to add a model can be found here
    • No, but there is an existing PR ___
  • The results submitted are obtained using the reference implementation
  • My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
  • I solemnly swear that for all results submitted I have not trained on the evaluation dataset including training splits. If I have, I have disclosed it clearly.

Seems like we might need to refactor the task a bit, models are already getting >90 and have only run a few
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 29, 2026

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: BAAI/bge-m3, Cohere/Cohere-embed-v4.0, google/embeddinggemma-300m, intfloat/multilingual-e5-base, intfloat/multilingual-e5-large-instruct, intfloat/multilingual-e5-large, intfloat/multilingual-e5-small, jinaai/jina-colbert-v2, jinaai/jina-embeddings-v5-text-nano, minishlab/potion-multilingual-128M, mteb/baseline-bm25s, mteb/baseline-random-encoder, openai/text-embedding-3-large, openai/text-embedding-3-small, sentence-transformers/LaBSE, sentence-transformers/all-MiniLM-L12-v2, sentence-transformers/all-MiniLM-L6-v2, sentence-transformers/paraphrase-multilingual-mpnet-base-v2, sentence-transformers/static-similarity-mrl-multilingual-v1
Tasks: LexRetrieval.v1

Results for BAAI/bge-m3

task_name BAAI/bge-m3 intfloat/multilingual-e5-large Max result Model with max result In Training Data
LexRetrieval.v1 0.9559 0.9293 False
Average 0.9559 0.9293 nan -

Training datasets: CMedQAv1-reranking, CMedQAv2-reranking, CmedqaRetrieval, CodeSearchNet, DuRetrieval, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, LeCaRDv2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MMarcoReranking, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, T2Reranking, T2Retrieval, mMARCO-NL


Results for Cohere/Cohere-embed-v4.0

task_name Cohere/Cohere-embed-v4.0 intfloat/multilingual-e5-large Max result Model with max result In Training Data
LexRetrieval.v1 0.9544 0.9293 False
Average 0.9544 0.9293 nan -

Results for google/embeddinggemma-300m

task_name google/embeddinggemma-300m intfloat/multilingual-e5-large Max result Model with max result In Training Data
LexRetrieval.v1 0.2618 0.9293 False
Average 0.2618 0.9293 nan -

Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoNQ-VN, NanoNQRetrieval


Results for intfloat/multilingual-e5-base

task_name intfloat/multilingual-e5-base intfloat/multilingual-e5-large Max result Model with max result In Training Data
LexRetrieval.v1 0.8862 0.9293 False
Average 0.8862 0.9293 nan -

Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-PL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, XQuADRetrieval, mMARCO-NL


Results for intfloat/multilingual-e5-large-instruct

task_name intfloat/multilingual-e5-large intfloat/multilingual-e5-large-instruct Max result Model with max result In Training Data
LexRetrieval.v1 0.9293 0.9471 False
Average 0.9293 0.9471 nan -

Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-PL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, XQuADRetrieval, mMARCO-NL


Results for intfloat/multilingual-e5-large

task_name intfloat/multilingual-e5-large Max result Model with max result In Training Data
LexRetrieval.v1 0.9293 False
Average 0.9293 nan -

Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-PL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, XQuADRetrieval, mMARCO-NL


Results for intfloat/multilingual-e5-small

task_name intfloat/multilingual-e5-large intfloat/multilingual-e5-small Max result Model with max result In Training Data
LexRetrieval.v1 0.9293 0.8424 False
Average 0.9293 0.8424 nan -

Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-PL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, XQuADRetrieval, mMARCO-NL


Results for jinaai/jina-colbert-v2

task_name intfloat/multilingual-e5-large jinaai/jina-colbert-v2 Max result Model with max result In Training Data
LexRetrieval.v1 0.9293 0.8656 False
Average 0.9293 0.8656 nan -

Training datasets: DuRetrieval, MIRACL, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, NanoMSMARCO-VN, NanoMSMARCORetrieval, mMARCO-NL


Results for jinaai/jina-embeddings-v5-text-nano

task_name intfloat/multilingual-e5-large jinaai/jina-embeddings-v5-text-nano Max result Model with max result In Training Data
LexRetrieval.v1 0.9293 0.9611 False
Average 0.9293 0.9611 nan -

Results for minishlab/potion-multilingual-128M

task_name intfloat/multilingual-e5-large minishlab/potion-multilingual-128M Max result Model with max result In Training Data
LexRetrieval.v1 0.9293 0.4571 False
Average 0.9293 0.4571 nan -

Training datasets: AmazonReviewsClassification, AmazonReviewsVNClassification, CMedQAv1-reranking, CMedQAv2-reranking, CmedqaRetrieval, CodeSearchNet, DuRetrieval, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, LeCaRDv2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MLQARetrieval, MMarcoReranking, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, T2Reranking, T2Retrieval, mMARCO-NL


Results for mteb/baseline-bm25s

task_name intfloat/multilingual-e5-large mteb/baseline-bm25s Max result Model with max result In Training Data
LexRetrieval.v1 0.9293 0.8983 False
Average 0.9293 0.8983 nan -

Results for mteb/baseline-random-encoder

task_name intfloat/multilingual-e5-large mteb/baseline-random-encoder Max result Model with max result In Training Data
LexRetrieval.v1 0.9293 0.0005 False
Average 0.9293 0.0005 nan -

Results for openai/text-embedding-3-large

task_name intfloat/multilingual-e5-large openai/text-embedding-3-large Max result Model with max result In Training Data
LexRetrieval.v1 0.9293 0.9605 False
Average 0.9293 0.9605 nan -

Results for openai/text-embedding-3-small

task_name intfloat/multilingual-e5-large openai/text-embedding-3-small Max result Model with max result In Training Data
LexRetrieval.v1 0.9293 0.9384 False
Average 0.9293 0.9384 nan -

Results for sentence-transformers/LaBSE

task_name intfloat/multilingual-e5-large sentence-transformers/LaBSE Max result Model with max result In Training Data
LexRetrieval.v1 0.9293 0.5814 False
Average 0.9293 0.5814 nan -

Results for sentence-transformers/all-MiniLM-L12-v2

task_name intfloat/multilingual-e5-large sentence-transformers/all-MiniLM-L12-v2 Max result Model with max result In Training Data
LexRetrieval.v1 0.9293 0.4286 False
Average 0.9293 0.4286 nan -

Training datasets: MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, mMARCO-NL


Results for sentence-transformers/all-MiniLM-L6-v2

task_name intfloat/multilingual-e5-large sentence-transformers/all-MiniLM-L6-v2 Max result Model with max result In Training Data
LexRetrieval.v1 0.9293 0.2791 False
Average 0.9293 0.2791 nan -

Training datasets: MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, mMARCO-NL


Results for sentence-transformers/paraphrase-multilingual-mpnet-base-v2

task_name intfloat/multilingual-e5-large sentence-transformers/paraphrase-multilingual-mpnet-base-v2 Max result Model with max result In Training Data
LexRetrieval.v1 0.9293 0.7342 False
Average 0.9293 0.7342 nan -

Training datasets: MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, mMARCO-NL


Results for sentence-transformers/static-similarity-mrl-multilingual-v1

task_name intfloat/multilingual-e5-large sentence-transformers/static-similarity-mrl-multilingual-v1 Max result Model with max result In Training Data
LexRetrieval.v1 0.9293 0.5136 False
Average 0.9293 0.5136 nan -

Training datasets: NanoQuoraRetrieval, Quora-NL, Quora-PL, Quora-PLHardNegatives, QuoraRetrieval, QuoraRetrieval-Fa, QuoraRetrieval-Fa.v2, QuoraRetrievalHardNegatives, QuoraRetrievalHardNegatives.v2, StackExchangeClustering, StackExchangeClustering-VN, StackExchangeClustering.v2, StackExchangeClusteringP2P, StackExchangeClusteringP2P-VN, StackExchangeClusteringP2P.v2, TatoebaBitextMining



Note: Content truncated due to GitHub API limits. See the full report in the workflow artifacts.

@KennethEnevoldsen KennethEnevoldsen marked this pull request as ready for review April 29, 2026 19:34
@KennethEnevoldsen KennethEnevoldsen requested review from Samoed and removed request for Samoed April 29, 2026 19:34
@KennethEnevoldsen KennethEnevoldsen marked this pull request as draft April 29, 2026 19:35
@KennethEnevoldsen
Copy link
Copy Markdown
Contributor Author

Will postpone submitting these results as there seems to be a cealing effect.

@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.

@github-actions github-actions Bot added the stale label May 14, 2026
@github-actions
Copy link
Copy Markdown

This pull request has been automatically closed due to inactivity.

@github-actions github-actions Bot closed this May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant