Add scores for LexRetrieval.v1#499
Conversation
Seems like we might need to refactor the task a bit, models are already getting >90 and have only run a few
Model Results ComparisonReference models: Results for
|
| task_name | BAAI/bge-m3 | intfloat/multilingual-e5-large | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| LexRetrieval.v1 | 0.9559 | 0.9293 | False | ||
| Average | 0.9559 | 0.9293 | nan | - |
Training datasets: CMedQAv1-reranking, CMedQAv2-reranking, CmedqaRetrieval, CodeSearchNet, DuRetrieval, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, LeCaRDv2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MMarcoReranking, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, T2Reranking, T2Retrieval, mMARCO-NL
Results for Cohere/Cohere-embed-v4.0
| task_name | Cohere/Cohere-embed-v4.0 | intfloat/multilingual-e5-large | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| LexRetrieval.v1 | 0.9544 | 0.9293 | False | ||
| Average | 0.9544 | 0.9293 | nan | - |
Results for google/embeddinggemma-300m
| task_name | google/embeddinggemma-300m | intfloat/multilingual-e5-large | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| LexRetrieval.v1 | 0.2618 | 0.9293 | False | ||
| Average | 0.2618 | 0.9293 | nan | - |
Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoNQ-VN, NanoNQRetrieval
Results for intfloat/multilingual-e5-base
| task_name | intfloat/multilingual-e5-base | intfloat/multilingual-e5-large | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| LexRetrieval.v1 | 0.8862 | 0.9293 | False | ||
| Average | 0.8862 | 0.9293 | nan | - |
Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-PL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, XQuADRetrieval, mMARCO-NL
Results for intfloat/multilingual-e5-large-instruct
| task_name | intfloat/multilingual-e5-large | intfloat/multilingual-e5-large-instruct | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| LexRetrieval.v1 | 0.9293 | 0.9471 | False | ||
| Average | 0.9293 | 0.9471 | nan | - |
Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-PL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, XQuADRetrieval, mMARCO-NL
Results for intfloat/multilingual-e5-large
| task_name | intfloat/multilingual-e5-large | Max result | Model with max result | In Training Data |
|---|---|---|---|---|
| LexRetrieval.v1 | 0.9293 | False | ||
| Average | 0.9293 | nan | - |
Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-PL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, XQuADRetrieval, mMARCO-NL
Results for intfloat/multilingual-e5-small
| task_name | intfloat/multilingual-e5-large | intfloat/multilingual-e5-small | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| LexRetrieval.v1 | 0.9293 | 0.8424 | False | ||
| Average | 0.9293 | 0.8424 | nan | - |
Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-PL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, XQuADRetrieval, mMARCO-NL
Results for jinaai/jina-colbert-v2
| task_name | intfloat/multilingual-e5-large | jinaai/jina-colbert-v2 | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| LexRetrieval.v1 | 0.9293 | 0.8656 | False | ||
| Average | 0.9293 | 0.8656 | nan | - |
Training datasets: DuRetrieval, MIRACL, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, NanoMSMARCO-VN, NanoMSMARCORetrieval, mMARCO-NL
Results for jinaai/jina-embeddings-v5-text-nano
| task_name | intfloat/multilingual-e5-large | jinaai/jina-embeddings-v5-text-nano | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| LexRetrieval.v1 | 0.9293 | 0.9611 | False | ||
| Average | 0.9293 | 0.9611 | nan | - |
Results for minishlab/potion-multilingual-128M
| task_name | intfloat/multilingual-e5-large | minishlab/potion-multilingual-128M | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| LexRetrieval.v1 | 0.9293 | 0.4571 | False | ||
| Average | 0.9293 | 0.4571 | nan | - |
Training datasets: AmazonReviewsClassification, AmazonReviewsVNClassification, CMedQAv1-reranking, CMedQAv2-reranking, CmedqaRetrieval, CodeSearchNet, DuRetrieval, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, LeCaRDv2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MLQARetrieval, MMarcoReranking, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, T2Reranking, T2Retrieval, mMARCO-NL
Results for mteb/baseline-bm25s
| task_name | intfloat/multilingual-e5-large | mteb/baseline-bm25s | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| LexRetrieval.v1 | 0.9293 | 0.8983 | False | ||
| Average | 0.9293 | 0.8983 | nan | - |
Results for mteb/baseline-random-encoder
| task_name | intfloat/multilingual-e5-large | mteb/baseline-random-encoder | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| LexRetrieval.v1 | 0.9293 | 0.0005 | False | ||
| Average | 0.9293 | 0.0005 | nan | - |
Results for openai/text-embedding-3-large
| task_name | intfloat/multilingual-e5-large | openai/text-embedding-3-large | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| LexRetrieval.v1 | 0.9293 | 0.9605 | False | ||
| Average | 0.9293 | 0.9605 | nan | - |
Results for openai/text-embedding-3-small
| task_name | intfloat/multilingual-e5-large | openai/text-embedding-3-small | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| LexRetrieval.v1 | 0.9293 | 0.9384 | False | ||
| Average | 0.9293 | 0.9384 | nan | - |
Results for sentence-transformers/LaBSE
| task_name | intfloat/multilingual-e5-large | sentence-transformers/LaBSE | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| LexRetrieval.v1 | 0.9293 | 0.5814 | False | ||
| Average | 0.9293 | 0.5814 | nan | - |
Results for sentence-transformers/all-MiniLM-L12-v2
| task_name | intfloat/multilingual-e5-large | sentence-transformers/all-MiniLM-L12-v2 | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| LexRetrieval.v1 | 0.9293 | 0.4286 | False | ||
| Average | 0.9293 | 0.4286 | nan | - |
Training datasets: MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, mMARCO-NL
Results for sentence-transformers/all-MiniLM-L6-v2
| task_name | intfloat/multilingual-e5-large | sentence-transformers/all-MiniLM-L6-v2 | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| LexRetrieval.v1 | 0.9293 | 0.2791 | False | ||
| Average | 0.9293 | 0.2791 | nan | - |
Training datasets: MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, mMARCO-NL
Results for sentence-transformers/paraphrase-multilingual-mpnet-base-v2
| task_name | intfloat/multilingual-e5-large | sentence-transformers/paraphrase-multilingual-mpnet-base-v2 | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| LexRetrieval.v1 | 0.9293 | 0.7342 | False | ||
| Average | 0.9293 | 0.7342 | nan | - |
Training datasets: MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, mMARCO-NL
Results for sentence-transformers/static-similarity-mrl-multilingual-v1
| task_name | intfloat/multilingual-e5-large | sentence-transformers/static-similarity-mrl-multilingual-v1 | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| LexRetrieval.v1 | 0.9293 | 0.5136 | False | ||
| Average | 0.9293 | 0.5136 | nan | - |
Training datasets: NanoQuoraRetrieval, Quora-NL, Quora-PL, Quora-PLHardNegatives, QuoraRetrieval, QuoraRetrieval-Fa, QuoraRetrieval-Fa.v2, QuoraRetrievalHardNegatives, QuoraRetrievalHardNegatives.v2, StackExchangeClustering, StackExchangeClustering-VN, StackExchangeClustering.v2, StackExchangeClusteringP2P, StackExchangeClusteringP2P-VN, StackExchangeClusteringP2P.v2, TatoebaBitextMining
Note: Content truncated due to GitHub API limits. See the full report in the workflow artifacts.
|
Will postpone submitting these results as there seems to be a cealing effect. |
|
This pull request has been automatically marked as stale due to inactivity. |
|
This pull request has been automatically closed due to inactivity. |
Seems like we might need to refactor the task a bit, models are already getting >90 and have only run a few
Checklist
mteb/models/model_implementations/, this can be as an API. Instruction on how to add a model can be found here