Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Oct 22, 2025

Overview

This PR introduces the vespa.nanobeir module, a robust utility system that simplifies running NanoBEIR evaluations with different embedding models. The implementation provides a model-centric interface where all Vespa schema configurations (embedding dimensions, field types, indexing statements, and distance metrics) automatically adapt based on model parameters.

Motivation

Previously, evaluating different embedding models required manually adjusting multiple interconnected configurations:

  • Embedding field dimensions and types
  • Indexing statements (with/without pack_bits)
  • Distance metrics (hamming vs. cosine)
  • Rank profile expressions
  • Component configurations

This was error-prone and made it difficult to quickly compare models. The new utilities automate all these adjustments based on a single ModelConfig object.

Key Features

1. Model Configuration Dataclass

config = ModelConfig(
    model_id="e5-small-v2",
    embedding_dim=384,
    tokenizer_id="e5-base-v2-vocab",
    binarized=False,
)

2. Automatic Binary vs. Float Handling

For float embeddings:

  • Field type: tensor<float>(x[384])
  • Indexing: ["input text", "embed", "index", "attribute"]
  • Distance metric: angular (cosine similarity)

For binarized embeddings:

  • Field type: tensor<int8>(x[128]) (1024 bits → 128 bytes)
  • Indexing: ["input text", "embed", "pack_bits", "index", "attribute"]
  • Distance metric: hamming
  • Ranking: Adjusted expressions for hamming distance

3. Helper Functions

  • create_embedder_component(): Creates Vespa HuggingFace embedder components
  • create_embedding_field(): Creates properly configured embedding fields
  • create_semantic_rank_profile(): Creates semantic search profiles
  • create_hybrid_rank_profile(): Creates hybrid (BM25 + semantic) profiles

4. Predefined Models

Six common models are pre-configured:

  • e5-small-v2 (384-dim float)
  • e5-base-v2 (768-dim float)
  • snowflake-arctic-embed-xs/s/m (384/768-dim float)
  • bge-m3-binary (1024-dim binary)

Example Usage

from vespa.nanobeir import get_model_config, create_evaluation_package

# Switch models with a single line change
config = get_model_config("e5-small-v2")  # or "bge-m3-binary"
package = create_evaluation_package(config, app_name="myeval")

# All schema configurations automatically adapt:
# - Field types, dimensions, and indexing
# - Distance metrics and ranking expressions
# - Component configurations

Testing

  • 31 new unit tests covering all functionality
  • Integration tests verifying complete float and binary setups
  • Example script demonstrating real-world usage
  • All 444 existing tests continue to pass

Files Changed

  • vespa/nanobeir.py: Core utilities module (373 lines)
  • tests/unit/test_nanobeir.py: Comprehensive test suite (475 lines)
  • examples/nanobeir_evaluation_example.py: Working example (208 lines)
  • examples/README.md: Usage documentation

Benefits

  1. Simplicity: Evaluate new models with minimal code changes
  2. Correctness: Automatically handles all configuration interdependencies
  3. Extensibility: Easy to add new model configurations
  4. Robustness: Comprehensive test coverage ensures reliability
Original prompt

Based on this script, I need you to do the following:
Overall goal: Create a script that can easily run NanoBeirEvaluation for different models. Possible we want to create some sort of leaderboard per model.
We want to make script be robust, and create a Component-object- based on parameters such as transformer-model, tokenizer-model etc.
The interface of running an evaluation should all be related to the model. All else should stay the same.
Things that will change with model:

  • Embedding dimension x[768] etc.
  • If model is configured with binarized, indexing statement (pack_bits) and embedding field type (int8) must be updated accordingly, as well as ranking expression (hamming distance instead of cosine).

Created from VS Code via the GitHub Pull Request extension.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI changed the title [WIP] Add script to run NanoBeirEvaluation for different models Add NanoBEIR evaluation utilities with model-based configuration Oct 22, 2025
Copilot AI requested a review from thomasht86 October 22, 2025 06:26
Copilot finished work on behalf of thomasht86 October 22, 2025 06:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants