Add NanoBEIR evaluation utilities with model-based configuration #1145
+2,385
−432
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This PR introduces the
vespa.nanobeirmodule, a robust utility system that simplifies running NanoBEIR evaluations with different embedding models. The implementation provides a model-centric interface where all Vespa schema configurations (embedding dimensions, field types, indexing statements, and distance metrics) automatically adapt based on model parameters.Motivation
Previously, evaluating different embedding models required manually adjusting multiple interconnected configurations:
pack_bits)This was error-prone and made it difficult to quickly compare models. The new utilities automate all these adjustments based on a single
ModelConfigobject.Key Features
1. Model Configuration Dataclass
2. Automatic Binary vs. Float Handling
For float embeddings:
tensor<float>(x[384])["input text", "embed", "index", "attribute"]angular(cosine similarity)For binarized embeddings:
tensor<int8>(x[128])(1024 bits → 128 bytes)["input text", "embed", "pack_bits", "index", "attribute"]hamming3. Helper Functions
create_embedder_component(): Creates Vespa HuggingFace embedder componentscreate_embedding_field(): Creates properly configured embedding fieldscreate_semantic_rank_profile(): Creates semantic search profilescreate_hybrid_rank_profile(): Creates hybrid (BM25 + semantic) profiles4. Predefined Models
Six common models are pre-configured:
e5-small-v2(384-dim float)e5-base-v2(768-dim float)snowflake-arctic-embed-xs/s/m(384/768-dim float)bge-m3-binary(1024-dim binary)Example Usage
Testing
Files Changed
vespa/nanobeir.py: Core utilities module (373 lines)tests/unit/test_nanobeir.py: Comprehensive test suite (475 lines)examples/nanobeir_evaluation_example.py: Working example (208 lines)examples/README.md: Usage documentationBenefits
Original prompt
Created from VS Code via the GitHub Pull Request extension.
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.