Skip to content

add ElasticKBRetrieval results#502

Open
KennethEnevoldsen wants to merge 1 commit intomainfrom
add-results-for-ElasticKBRetrieval
Open

add ElasticKBRetrieval results#502
KennethEnevoldsen wants to merge 1 commit intomainfrom
add-results-for-ElasticKBRetrieval

Conversation

@KennethEnevoldsen
Copy link
Copy Markdown
Contributor

@KennethEnevoldsen KennethEnevoldsen commented Apr 30, 2026

"hit_rate_at_20": 0.78017,
"hit_rate_at_100": 0.88793,
"hit_rate_at_1000": 0.98276,
"main_score": 0.21147,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Samoed it seems like I might have run this model incorrectly - I recall that we had to do a thing, but feel like that was just ensuring a recent version of transformers - am I missing something?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think that you run incorrect version of transformers?

Copy link
Copy Markdown
Contributor Author

@KennethEnevoldsen KennethEnevoldsen Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just very low the scores - I suspect there is something with the model that didn't go as intended

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gemma 300m results look good here

@emilia-elastic emilia-elastic mentioned this pull request Apr 30, 2026
1 task
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: BAAI/bge-m3, google/embeddinggemma-300m, intfloat/multilingual-e5-base, intfloat/multilingual-e5-large-instruct, intfloat/multilingual-e5-large, intfloat/multilingual-e5-small, jinaai/jina-colbert-v2, jinaai/jina-embeddings-v5-text-nano, minishlab/potion-multilingual-128M, mteb/baseline-bm25s, mteb/baseline-random-encoder, sentence-transformers/LaBSE, sentence-transformers/all-MiniLM-L12-v2, sentence-transformers/all-MiniLM-L6-v2, sentence-transformers/paraphrase-multilingual-mpnet-base-v2, sentence-transformers/static-similarity-mrl-multilingual-v1
Tasks: ElasticKBRetrieval

Results for BAAI/bge-m3

task_name BAAI/bge-m3 intfloat/multilingual-e5-large Max result Model with max result In Training Data
ElasticKBRetrieval 0.5337 0.5084 False
Average 0.5337 0.5084 nan -

Training datasets: CMedQAv1-reranking, CMedQAv2-reranking, CmedqaRetrieval, CodeSearchNet, DuRetrieval, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, LeCaRDv2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MMarcoReranking, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, T2Reranking, T2Retrieval, mMARCO-NL


Results for google/embeddinggemma-300m

task_name google/embeddinggemma-300m intfloat/multilingual-e5-large Max result Model with max result In Training Data
ElasticKBRetrieval 0.1286 0.5084 False
Average 0.1286 0.5084 nan -

Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoNQ-VN, NanoNQRetrieval


Results for intfloat/multilingual-e5-base

task_name intfloat/multilingual-e5-base intfloat/multilingual-e5-large Max result Model with max result In Training Data
ElasticKBRetrieval 0.4674 0.5084 False
Average 0.4674 0.5084 nan -

Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-PL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, XQuADRetrieval, mMARCO-NL


Results for intfloat/multilingual-e5-large-instruct

task_name intfloat/multilingual-e5-large intfloat/multilingual-e5-large-instruct Max result Model with max result In Training Data
ElasticKBRetrieval 0.5084 0.5372 False
Average 0.5084 0.5372 nan -

Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-PL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, XQuADRetrieval, mMARCO-NL


Results for intfloat/multilingual-e5-large

task_name intfloat/multilingual-e5-large Max result Model with max result In Training Data
ElasticKBRetrieval 0.5084 False
Average 0.5084 nan -

Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-PL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, XQuADRetrieval, mMARCO-NL


Results for intfloat/multilingual-e5-small

task_name intfloat/multilingual-e5-large intfloat/multilingual-e5-small Max result Model with max result In Training Data
ElasticKBRetrieval 0.5084 0.4202 False
Average 0.5084 0.4202 nan -

Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-PL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, XQuADRetrieval, mMARCO-NL


Results for jinaai/jina-colbert-v2

task_name intfloat/multilingual-e5-large jinaai/jina-colbert-v2 Max result Model with max result In Training Data
ElasticKBRetrieval 0.5084 0.4946 False
Average 0.5084 0.4946 nan -

Training datasets: DuRetrieval, MIRACL, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, NanoMSMARCO-VN, NanoMSMARCORetrieval, mMARCO-NL


Results for jinaai/jina-embeddings-v5-text-nano

task_name intfloat/multilingual-e5-large jinaai/jina-embeddings-v5-text-nano Max result Model with max result In Training Data
ElasticKBRetrieval 0.5084 0.5676 False
Average 0.5084 0.5676 nan -

Results for minishlab/potion-multilingual-128M

task_name intfloat/multilingual-e5-large minishlab/potion-multilingual-128M Max result Model with max result In Training Data
ElasticKBRetrieval 0.5084 0.2949 False
Average 0.5084 0.2949 nan -

Training datasets: AmazonReviewsClassification, AmazonReviewsVNClassification, CMedQAv1-reranking, CMedQAv2-reranking, CmedqaRetrieval, CodeSearchNet, DuRetrieval, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, LeCaRDv2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MLQARetrieval, MMarcoReranking, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, T2Reranking, T2Retrieval, mMARCO-NL


Results for mteb/baseline-bm25s

task_name intfloat/multilingual-e5-large mteb/baseline-bm25s Max result Model with max result In Training Data
ElasticKBRetrieval 0.5084 0.5324 False
Average 0.5084 0.5324 nan -

Results for mteb/baseline-random-encoder

task_name intfloat/multilingual-e5-large mteb/baseline-random-encoder Max result Model with max result In Training Data
ElasticKBRetrieval 0.5084 0.0084 False
Average 0.5084 0.0084 nan -

Results for sentence-transformers/LaBSE

task_name intfloat/multilingual-e5-large sentence-transformers/LaBSE Max result Model with max result In Training Data
ElasticKBRetrieval 0.5084 0.255 False
Average 0.5084 0.255 nan -

Results for sentence-transformers/all-MiniLM-L12-v2

task_name intfloat/multilingual-e5-large sentence-transformers/all-MiniLM-L12-v2 Max result Model with max result In Training Data
ElasticKBRetrieval 0.5084 0.4598 False
Average 0.5084 0.4598 nan -

Training datasets: MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, mMARCO-NL


Results for sentence-transformers/all-MiniLM-L6-v2

task_name intfloat/multilingual-e5-large sentence-transformers/all-MiniLM-L6-v2 Max result Model with max result In Training Data
ElasticKBRetrieval 0.5084 0.4684 False
Average 0.5084 0.4684 nan -

Training datasets: MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, mMARCO-NL


Results for sentence-transformers/paraphrase-multilingual-mpnet-base-v2

task_name intfloat/multilingual-e5-large sentence-transformers/paraphrase-multilingual-mpnet-base-v2 Max result Model with max result In Training Data
ElasticKBRetrieval 0.5084 0.3356 False
Average 0.5084 0.3356 nan -

Training datasets: MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, mMARCO-NL


Results for sentence-transformers/static-similarity-mrl-multilingual-v1

task_name intfloat/multilingual-e5-large sentence-transformers/static-similarity-mrl-multilingual-v1 Max result Model with max result In Training Data
ElasticKBRetrieval 0.5084 0.2787 False
Average 0.5084 0.2787 nan -

Training datasets: NanoQuoraRetrieval, Quora-NL, Quora-PL, Quora-PLHardNegatives, QuoraRetrieval, QuoraRetrieval-Fa, QuoraRetrieval-Fa.v2, QuoraRetrievalHardNegatives, QuoraRetrievalHardNegatives.v2, StackExchangeClustering, StackExchangeClustering-VN, StackExchangeClustering.v2, StackExchangeClusteringP2P, StackExchangeClusteringP2P-VN, StackExchangeClusteringP2P.v2, TatoebaBitextMining



Note: Content truncated due to GitHub API limits. See the full report in the workflow artifacts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants