add ElasticKBRetrieval results#502
Conversation
| "hit_rate_at_20": 0.78017, | ||
| "hit_rate_at_100": 0.88793, | ||
| "hit_rate_at_1000": 0.98276, | ||
| "main_score": 0.21147, |
There was a problem hiding this comment.
@Samoed it seems like I might have run this model incorrectly - I recall that we had to do a thing, but feel like that was just ensuring a recent version of transformers - am I missing something?
There was a problem hiding this comment.
Why do you think that you run incorrect version of transformers?
There was a problem hiding this comment.
It is just very low the scores - I suspect there is something with the model that didn't go as intended
There was a problem hiding this comment.
Gemma 300m results look good here
Model Results ComparisonReference models: Results for
|
| task_name | BAAI/bge-m3 | intfloat/multilingual-e5-large | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| ElasticKBRetrieval | 0.5337 | 0.5084 | False | ||
| Average | 0.5337 | 0.5084 | nan | - |
Training datasets: CMedQAv1-reranking, CMedQAv2-reranking, CmedqaRetrieval, CodeSearchNet, DuRetrieval, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, LeCaRDv2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MMarcoReranking, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, T2Reranking, T2Retrieval, mMARCO-NL
Results for google/embeddinggemma-300m
| task_name | google/embeddinggemma-300m | intfloat/multilingual-e5-large | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| ElasticKBRetrieval | 0.1286 | 0.5084 | False | ||
| Average | 0.1286 | 0.5084 | nan | - |
Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoNQ-VN, NanoNQRetrieval
Results for intfloat/multilingual-e5-base
| task_name | intfloat/multilingual-e5-base | intfloat/multilingual-e5-large | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| ElasticKBRetrieval | 0.4674 | 0.5084 | False | ||
| Average | 0.4674 | 0.5084 | nan | - |
Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-PL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, XQuADRetrieval, mMARCO-NL
Results for intfloat/multilingual-e5-large-instruct
| task_name | intfloat/multilingual-e5-large | intfloat/multilingual-e5-large-instruct | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| ElasticKBRetrieval | 0.5084 | 0.5372 | False | ||
| Average | 0.5084 | 0.5372 | nan | - |
Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-PL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, XQuADRetrieval, mMARCO-NL
Results for intfloat/multilingual-e5-large
| task_name | intfloat/multilingual-e5-large | Max result | Model with max result | In Training Data |
|---|---|---|---|---|
| ElasticKBRetrieval | 0.5084 | False | ||
| Average | 0.5084 | nan | - |
Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-PL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, XQuADRetrieval, mMARCO-NL
Results for intfloat/multilingual-e5-small
| task_name | intfloat/multilingual-e5-large | intfloat/multilingual-e5-small | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| ElasticKBRetrieval | 0.5084 | 0.4202 | False | ||
| Average | 0.5084 | 0.4202 | nan | - |
Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-PL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, XQuADRetrieval, mMARCO-NL
Results for jinaai/jina-colbert-v2
| task_name | intfloat/multilingual-e5-large | jinaai/jina-colbert-v2 | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| ElasticKBRetrieval | 0.5084 | 0.4946 | False | ||
| Average | 0.5084 | 0.4946 | nan | - |
Training datasets: DuRetrieval, MIRACL, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, NanoMSMARCO-VN, NanoMSMARCORetrieval, mMARCO-NL
Results for jinaai/jina-embeddings-v5-text-nano
| task_name | intfloat/multilingual-e5-large | jinaai/jina-embeddings-v5-text-nano | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| ElasticKBRetrieval | 0.5084 | 0.5676 | False | ||
| Average | 0.5084 | 0.5676 | nan | - |
Results for minishlab/potion-multilingual-128M
| task_name | intfloat/multilingual-e5-large | minishlab/potion-multilingual-128M | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| ElasticKBRetrieval | 0.5084 | 0.2949 | False | ||
| Average | 0.5084 | 0.2949 | nan | - |
Training datasets: AmazonReviewsClassification, AmazonReviewsVNClassification, CMedQAv1-reranking, CMedQAv2-reranking, CmedqaRetrieval, CodeSearchNet, DuRetrieval, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, LeCaRDv2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MLQARetrieval, MMarcoReranking, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, T2Reranking, T2Retrieval, mMARCO-NL
Results for mteb/baseline-bm25s
| task_name | intfloat/multilingual-e5-large | mteb/baseline-bm25s | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| ElasticKBRetrieval | 0.5084 | 0.5324 | False | ||
| Average | 0.5084 | 0.5324 | nan | - |
Results for mteb/baseline-random-encoder
| task_name | intfloat/multilingual-e5-large | mteb/baseline-random-encoder | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| ElasticKBRetrieval | 0.5084 | 0.0084 | False | ||
| Average | 0.5084 | 0.0084 | nan | - |
Results for sentence-transformers/LaBSE
| task_name | intfloat/multilingual-e5-large | sentence-transformers/LaBSE | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| ElasticKBRetrieval | 0.5084 | 0.255 | False | ||
| Average | 0.5084 | 0.255 | nan | - |
Results for sentence-transformers/all-MiniLM-L12-v2
| task_name | intfloat/multilingual-e5-large | sentence-transformers/all-MiniLM-L12-v2 | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| ElasticKBRetrieval | 0.5084 | 0.4598 | False | ||
| Average | 0.5084 | 0.4598 | nan | - |
Training datasets: MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, mMARCO-NL
Results for sentence-transformers/all-MiniLM-L6-v2
| task_name | intfloat/multilingual-e5-large | sentence-transformers/all-MiniLM-L6-v2 | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| ElasticKBRetrieval | 0.5084 | 0.4684 | False | ||
| Average | 0.5084 | 0.4684 | nan | - |
Training datasets: MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, mMARCO-NL
Results for sentence-transformers/paraphrase-multilingual-mpnet-base-v2
| task_name | intfloat/multilingual-e5-large | sentence-transformers/paraphrase-multilingual-mpnet-base-v2 | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| ElasticKBRetrieval | 0.5084 | 0.3356 | False | ||
| Average | 0.5084 | 0.3356 | nan | - |
Training datasets: MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, mMARCO-NL
Results for sentence-transformers/static-similarity-mrl-multilingual-v1
| task_name | intfloat/multilingual-e5-large | sentence-transformers/static-similarity-mrl-multilingual-v1 | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| ElasticKBRetrieval | 0.5084 | 0.2787 | False | ||
| Average | 0.5084 | 0.2787 | nan | - |
Training datasets: NanoQuoraRetrieval, Quora-NL, Quora-PL, Quora-PLHardNegatives, QuoraRetrieval, QuoraRetrieval-Fa, QuoraRetrieval-Fa.v2, QuoraRetrievalHardNegatives, QuoraRetrievalHardNegatives.v2, StackExchangeClustering, StackExchangeClustering-VN, StackExchangeClustering.v2, StackExchangeClusteringP2P, StackExchangeClusteringP2P-VN, StackExchangeClusteringP2P.v2, TatoebaBitextMining
Note: Content truncated due to GitHub API limits. See the full report in the workflow artifacts.
related to embeddings-benchmark/mteb#4487