Skip to content

Update Querit results#501

Closed
moshesbeta wants to merge 0 commit into
embeddings-benchmark:mainfrom
moshesbeta:main
Closed

Update Querit results#501
moshesbeta wants to merge 0 commit into
embeddings-benchmark:mainfrom
moshesbeta:main

Conversation

@moshesbeta
Copy link
Copy Markdown
Contributor

@moshesbeta moshesbeta commented Apr 30, 2026

Checklist

  • My model has a model sheet, report, or similar
  • My model has a reference implementation in mteb/models/model_implementations/, this can be as an API. Instruction on how to add a model can be found here
  • The results submitted are obtained using the reference implementation
  • My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
  • I solemnly swear that for all results submitted I have not trained on the evaluation dataset including training splits. If I have, I have disclosed it clearly.

@github-actions
Copy link
Copy Markdown

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: Querit/Querit
Tasks: AlloprofReranking, RuBQReranking, T2Reranking, VoyageMMarcoReranking, WebLINXCandidatesReranking, WikipediaRerankingMultilingual

Results for Querit/Querit

task_name Querit/Querit google/gemini-embedding-001 intfloat/multilingual-e5-large Max result Model with max result In Training Data
AlloprofReranking 0.7919 0.8177 0.6944 0.8540 Octen/Octen-Embedding-8B False
RuBQReranking 0.7535 0.7384 0.756 0.8051 ai-sage/Giga-Embeddings-instruct False
T2Reranking 0.6895 0.6795 0.6632 0.7315 tencent/Youtu-Embedding True
VoyageMMarcoReranking 0.6788 0.6673 0.6821 0.8366 codefuse-ai/F2LLM-v2-14B False
WebLINXCandidatesReranking 0.1184 0.1097 0.0778 0.2246 codefuse-ai/F2LLM-v2-8B False
WikipediaRerankingMultilingual 0.9092 0.9224 0.8981 0.9308 jinaai/jina-reranker-v3 False
Average 0.6569 0.6558 0.6286 0.7304 nan -

Training datasets: AskUbuntuDupQuestions, AskUbuntuDupQuestions-VN, CQADupStack, MIRACLRanking, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MindSmallReranking, MrTidyRetrieval, MrTyDiJaRetrievalLite, MultiLongDocReranking, MultiLongDocRetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, T2Reranking, ruri-v3-dataset-reranker



Note: Content truncated due to GitHub API limits. See the full report in the workflow artifacts.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is deleted but never added

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still missing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our model primarily focuses on multilingual reranking tasks, and it's not tested on this dataset. Is it necessary to provide results from this test set?

@KennethEnevoldsen KennethEnevoldsen added waiting for review of implementation This PR is waiting for an implementation review before merging the results. and removed waiting for review of implementation This PR is waiting for an implementation review before merging the results. labels Apr 30, 2026
@moshesbeta moshesbeta closed this May 14, 2026
@Samoed Samoed mentioned this pull request May 14, 2026
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants