Add NanoBEIR evaluation utilities with model-based configuration #1145

Copilot · 2025-10-22T06:06:47Z

Overview

This PR introduces the vespa.nanobeir module, a robust utility system that simplifies running NanoBEIR evaluations with different embedding models. The implementation provides a model-centric interface where all Vespa schema configurations (embedding dimensions, field types, indexing statements, and distance metrics) automatically adapt based on model parameters.

Motivation

Previously, evaluating different embedding models required manually adjusting multiple interconnected configurations:

Embedding field dimensions and types
Indexing statements (with/without pack_bits)
Distance metrics (hamming vs. cosine)
Rank profile expressions
Component configurations

This was error-prone and made it difficult to quickly compare models. The new utilities automate all these adjustments based on a single ModelConfig object.

Key Features

1. Model Configuration Dataclass

config = ModelConfig(
    model_id="e5-small-v2",
    embedding_dim=384,
    tokenizer_id="e5-base-v2-vocab",
    binarized=False,
)

2. Automatic Binary vs. Float Handling

For float embeddings:

Field type: tensor<float>(x[384])
Indexing: ["input text", "embed", "index", "attribute"]
Distance metric: angular (cosine similarity)

For binarized embeddings:

Field type: tensor<int8>(x[128]) (1024 bits → 128 bytes)
Indexing: ["input text", "embed", "pack_bits", "index", "attribute"]
Distance metric: hamming
Ranking: Adjusted expressions for hamming distance

3. Helper Functions

create_embedder_component(): Creates Vespa HuggingFace embedder components
create_embedding_field(): Creates properly configured embedding fields
create_semantic_rank_profile(): Creates semantic search profiles
create_hybrid_rank_profile(): Creates hybrid (BM25 + semantic) profiles

4. Predefined Models

Six common models are pre-configured:

e5-small-v2 (384-dim float)
e5-base-v2 (768-dim float)
snowflake-arctic-embed-xs/s/m (384/768-dim float)
bge-m3-binary (1024-dim binary)

Example Usage

from vespa.nanobeir import get_model_config, create_evaluation_package

# Switch models with a single line change
config = get_model_config("e5-small-v2")  # or "bge-m3-binary"
package = create_evaluation_package(config, app_name="myeval")

# All schema configurations automatically adapt:
# - Field types, dimensions, and indexing
# - Distance metrics and ranking expressions
# - Component configurations

Testing

31 new unit tests covering all functionality
Integration tests verifying complete float and binary setups
Example script demonstrating real-world usage
All 444 existing tests continue to pass

Files Changed

vespa/nanobeir.py: Core utilities module (373 lines)
tests/unit/test_nanobeir.py: Comprehensive test suite (475 lines)
examples/nanobeir_evaluation_example.py: Working example (208 lines)
examples/README.md: Usage documentation

Benefits

Simplicity: Evaluate new models with minimal code changes
Correctness: Automatically handles all configuration interdependencies
Extensibility: Easy to add new model configurations
Robustness: Comprehensive test coverage ensures reliability

Original prompt

Based on this script, I need you to do the following:
Overall goal: Create a script that can easily run NanoBeirEvaluation for different models. Possible we want to create some sort of leaderboard per model.
We want to make script be robust, and create a Component-object- based on parameters such as transformer-model, tokenizer-model etc.
The interface of running an evaluation should all be related to the model. All else should stay the same.
Things that will change with model:

Embedding dimension x[768] etc.

If model is configured with binarized, indexing statement (pack_bits) and embedding field type (int8) must be updated accordingly, as well as ranking expression (hamming distance instead of cosine).

Created from VS Code via the GitHub Pull Request extension.

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: thomasht86 <[email protected]>

Initial plan

44ed918

Copilot AI assigned Copilot and thomasht86 Oct 22, 2025

Copilot started work on behalf of thomasht86 October 22, 2025 06:07 View session

Copilot AI and others added 2 commits October 22, 2025 06:20

Add NanoBEIR evaluation utilities with model configuration support

e748442

Co-authored-by: thomasht86 <[email protected]>

Add documentation and apply code formatting

dec7174

Co-authored-by: thomasht86 <[email protected]>

Copilot AI changed the title ~~[WIP] Add script to run NanoBeirEvaluation for different models~~ Add NanoBEIR evaluation utilities with model-based configuration Oct 22, 2025

Copilot AI requested a review from thomasht86 October 22, 2025 06:26

Copilot finished work on behalf of thomasht86 October 22, 2025 06:26

thomasht86 added 12 commits October 22, 2025 09:19

somewhat working

736b354

test samme as model hub docs

b9a635b

update unit tests to not use tokenizer when id is given

1567d95

id for embedding field and create_eval_package function

0a71767

fix inputs for match rank profile

876be69

add spaces to prepend

b28d017

add space to test

eea8353

remove wip file

6395945

add file to run (eg from tests)

3b7c8a5

add usage example

bdc499f

add local execution

2b3caf3

correct validation

a11f882

thomasht86 requested a review from sebastiannberg November 10, 2025 13:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add NanoBEIR evaluation utilities with model-based configuration #1145

Add NanoBEIR evaluation utilities with model-based configuration #1145

Copilot AI commented Oct 22, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add NanoBEIR evaluation utilities with model-based configuration #1145

Are you sure you want to change the base?

Add NanoBEIR evaluation utilities with model-based configuration #1145

Conversation

Copilot AI commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Motivation

Key Features

1. Model Configuration Dataclass

2. Automatic Binary vs. Float Handling

3. Helper Functions

4. Predefined Models

Example Usage

Testing

Files Changed

Benefits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Oct 22, 2025 •

edited

Loading