add ElasticKBRetrieval results by KennethEnevoldsen · Pull Request #502 · embeddings-benchmark/results

KennethEnevoldsen · 2026-04-30T09:32:20Z

related to embeddings-benchmark/mteb#4487

KennethEnevoldsen · 2026-04-30T09:38:53Z

+        "hit_rate_at_20": 0.78017,
+        "hit_rate_at_100": 0.88793,
+        "hit_rate_at_1000": 0.98276,
+        "main_score": 0.21147,


@Samoed it seems like I might have run this model incorrectly - I recall that we had to do a thing, but feel like that was just ensuring a recent version of transformers - am I missing something?

Why do you think that you run incorrect version of transformers?

It is just very low the scores - I suspect there is something with the model that didn't go as intended

Gemma 300m results look good here

github-actions · 2026-05-01T06:10:36Z

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: BAAI/bge-m3, google/embeddinggemma-300m, intfloat/multilingual-e5-base, intfloat/multilingual-e5-large-instruct, intfloat/multilingual-e5-large, intfloat/multilingual-e5-small, jinaai/jina-colbert-v2, jinaai/jina-embeddings-v5-text-nano, minishlab/potion-multilingual-128M, mteb/baseline-bm25s, mteb/baseline-random-encoder, sentence-transformers/LaBSE, sentence-transformers/all-MiniLM-L12-v2, sentence-transformers/all-MiniLM-L6-v2, sentence-transformers/paraphrase-multilingual-mpnet-base-v2, sentence-transformers/static-similarity-mrl-multilingual-v1
Tasks: ElasticKBRetrieval

Results for `BAAI/bge-m3`

task_name	BAAI/bge-m3	intfloat/multilingual-e5-large	Max result	Model with max result	In Training Data
ElasticKBRetrieval	0.5337	0.5084			False
Average	0.5337	0.5084		nan	-

Training datasets: CMedQAv1-reranking, CMedQAv2-reranking, CmedqaRetrieval, CodeSearchNet, DuRetrieval, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, LeCaRDv2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MMarcoReranking, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, T2Reranking, T2Retrieval, mMARCO-NL

Results for `google/embeddinggemma-300m`

task_name	google/embeddinggemma-300m	intfloat/multilingual-e5-large	Max result	Model with max result	In Training Data
ElasticKBRetrieval	0.1286	0.5084			False
Average	0.1286	0.5084		nan	-

Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoNQ-VN, NanoNQRetrieval

Results for `intfloat/multilingual-e5-base`

task_name	intfloat/multilingual-e5-base	intfloat/multilingual-e5-large	Max result	Model with max result	In Training Data
ElasticKBRetrieval	0.4674	0.5084			False
Average	0.4674	0.5084		nan	-

Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-PL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, XQuADRetrieval, mMARCO-NL

Results for `intfloat/multilingual-e5-large-instruct`

task_name	intfloat/multilingual-e5-large	intfloat/multilingual-e5-large-instruct	Max result	Model with max result	In Training Data
ElasticKBRetrieval	0.5084	0.5372			False
Average	0.5084	0.5372		nan	-

Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-PL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, XQuADRetrieval, mMARCO-NL

Results for `intfloat/multilingual-e5-large`

task_name	intfloat/multilingual-e5-large	Max result	Model with max result	In Training Data
ElasticKBRetrieval	0.5084			False
Average	0.5084		nan	-

Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-PL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, XQuADRetrieval, mMARCO-NL

Results for `intfloat/multilingual-e5-small`

task_name	intfloat/multilingual-e5-large	intfloat/multilingual-e5-small	Max result	Model with max result	In Training Data
ElasticKBRetrieval	0.5084	0.4202			False
Average	0.5084	0.4202		nan	-

Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-PL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, XQuADRetrieval, mMARCO-NL

Results for `jinaai/jina-colbert-v2`

task_name	intfloat/multilingual-e5-large	jinaai/jina-colbert-v2	Max result	Model with max result	In Training Data
ElasticKBRetrieval	0.5084	0.4946			False
Average	0.5084	0.4946		nan	-

Training datasets: DuRetrieval, MIRACL, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, NanoMSMARCO-VN, NanoMSMARCORetrieval, mMARCO-NL

Results for `jinaai/jina-embeddings-v5-text-nano`

task_name	intfloat/multilingual-e5-large	jinaai/jina-embeddings-v5-text-nano	Max result	Model with max result	In Training Data
ElasticKBRetrieval	0.5084	0.5676			False
Average	0.5084	0.5676		nan	-

Results for `minishlab/potion-multilingual-128M`

task_name	intfloat/multilingual-e5-large	minishlab/potion-multilingual-128M	Max result	Model with max result	In Training Data
ElasticKBRetrieval	0.5084	0.2949			False
Average	0.5084	0.2949		nan	-

Training datasets: AmazonReviewsClassification, AmazonReviewsVNClassification, CMedQAv1-reranking, CMedQAv2-reranking, CmedqaRetrieval, CodeSearchNet, DuRetrieval, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, LeCaRDv2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MLQARetrieval, MMarcoReranking, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, T2Reranking, T2Retrieval, mMARCO-NL

Results for `mteb/baseline-bm25s`

task_name	intfloat/multilingual-e5-large	mteb/baseline-bm25s	Max result	Model with max result	In Training Data
ElasticKBRetrieval	0.5084	0.5324			False
Average	0.5084	0.5324		nan	-

Results for `mteb/baseline-random-encoder`

task_name	intfloat/multilingual-e5-large	mteb/baseline-random-encoder	Max result	Model with max result	In Training Data
ElasticKBRetrieval	0.5084	0.0084			False
Average	0.5084	0.0084		nan	-

Results for `sentence-transformers/LaBSE`

task_name	intfloat/multilingual-e5-large	sentence-transformers/LaBSE	Max result	Model with max result	In Training Data
ElasticKBRetrieval	0.5084	0.255			False
Average	0.5084	0.255		nan	-

Results for `sentence-transformers/all-MiniLM-L12-v2`

task_name	intfloat/multilingual-e5-large	sentence-transformers/all-MiniLM-L12-v2	Max result	Model with max result	In Training Data
ElasticKBRetrieval	0.5084	0.4598			False
Average	0.5084	0.4598		nan	-

Training datasets: MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, mMARCO-NL

Results for `sentence-transformers/all-MiniLM-L6-v2`

task_name	intfloat/multilingual-e5-large	sentence-transformers/all-MiniLM-L6-v2	Max result	Model with max result	In Training Data
ElasticKBRetrieval	0.5084	0.4684			False
Average	0.5084	0.4684		nan	-

Training datasets: MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, mMARCO-NL

Results for `sentence-transformers/paraphrase-multilingual-mpnet-base-v2`

task_name	intfloat/multilingual-e5-large	sentence-transformers/paraphrase-multilingual-mpnet-base-v2	Max result	Model with max result	In Training Data
ElasticKBRetrieval	0.5084	0.3356			False
Average	0.5084	0.3356		nan	-

Training datasets: MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, mMARCO-NL

Results for `sentence-transformers/static-similarity-mrl-multilingual-v1`

task_name	intfloat/multilingual-e5-large	sentence-transformers/static-similarity-mrl-multilingual-v1	Max result	Model with max result	In Training Data
ElasticKBRetrieval	0.5084	0.2787			False
Average	0.5084	0.2787		nan	-

Training datasets: NanoQuoraRetrieval, Quora-NL, Quora-PL, Quora-PLHardNegatives, QuoraRetrieval, QuoraRetrieval-Fa, QuoraRetrieval-Fa.v2, QuoraRetrievalHardNegatives, QuoraRetrievalHardNegatives.v2, StackExchangeClustering, StackExchangeClustering-VN, StackExchangeClustering.v2, StackExchangeClusteringP2P, StackExchangeClusteringP2P-VN, StackExchangeClusteringP2P.v2, TatoebaBitextMining

Note: Content truncated due to GitHub API limits. See the full report in the workflow artifacts.

add ElasticKBRetrieval results

99f8769

KennethEnevoldsen commented Apr 30, 2026

View reviewed changes

emilia-elastic mentioned this pull request Apr 30, 2026

add: Elastic KB Results #505

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add ElasticKBRetrieval results#502

add ElasticKBRetrieval results#502
KennethEnevoldsen wants to merge 1 commit intomainfrom
add-results-for-ElasticKBRetrieval

KennethEnevoldsen commented Apr 30, 2026 •

edited

Loading

Uh oh!

KennethEnevoldsen Apr 30, 2026

Uh oh!

Samoed Apr 30, 2026

Uh oh!

KennethEnevoldsen Apr 30, 2026 •

edited

Loading

Uh oh!

emilia-elastic Apr 30, 2026

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

KennethEnevoldsen commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KennethEnevoldsen Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Samoed Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

emilia-elastic Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 1, 2026

Model Results Comparison

Results for BAAI/bge-m3

Results for google/embeddinggemma-300m

Results for intfloat/multilingual-e5-base

Results for intfloat/multilingual-e5-large-instruct

Results for intfloat/multilingual-e5-large

Results for intfloat/multilingual-e5-small

Results for jinaai/jina-colbert-v2

Results for jinaai/jina-embeddings-v5-text-nano

Results for minishlab/potion-multilingual-128M

Results for mteb/baseline-bm25s

Results for mteb/baseline-random-encoder

Results for sentence-transformers/LaBSE

Results for sentence-transformers/all-MiniLM-L12-v2

Results for sentence-transformers/all-MiniLM-L6-v2

Results for sentence-transformers/paraphrase-multilingual-mpnet-base-v2

Results for sentence-transformers/static-similarity-mrl-multilingual-v1

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

KennethEnevoldsen commented Apr 30, 2026 •

edited

Loading

KennethEnevoldsen Apr 30, 2026 •

edited

Loading

Results for `BAAI/bge-m3`

Results for `google/embeddinggemma-300m`

Results for `intfloat/multilingual-e5-base`

Results for `intfloat/multilingual-e5-large-instruct`

Results for `intfloat/multilingual-e5-large`

Results for `intfloat/multilingual-e5-small`

Results for `jinaai/jina-colbert-v2`

Results for `jinaai/jina-embeddings-v5-text-nano`

Results for `minishlab/potion-multilingual-128M`

Results for `mteb/baseline-bm25s`

Results for `mteb/baseline-random-encoder`

Results for `sentence-transformers/LaBSE`

Results for `sentence-transformers/all-MiniLM-L12-v2`

Results for `sentence-transformers/all-MiniLM-L6-v2`

Results for `sentence-transformers/paraphrase-multilingual-mpnet-base-v2`

Results for `sentence-transformers/static-similarity-mrl-multilingual-v1`