Skip to content

Conversation

@margaretjgu
Copy link

@margaretjgu margaretjgu commented Nov 12, 2025

Elasticsearch Hybrid Search Implementation

Summary

This PR adds hybrid search support to the Elasticsearch vector store integration in LangChain.js. Hybrid search combines semantic (vector) search with lexical (BM25 full-text) search using Reciprocal Rank Fusion (RRF) for improved search relevance.

Related documentation PR: langchain-ai/docs#1466

Features

1. Hybrid Search Strategy

  • New HybridRetrievalStrategy class for configuring hybrid search
  • Combines kNN vector search with BM25 full-text search
  • Uses Elasticsearch's built-in RRF (Reciprocal Rank Fusion) for result merging
  • Server-side implementation for optimal performance

2. Backward Compatible

  • Existing pure vector search behavior unchanged
  • Hybrid search is opt-in via strategy parameter
  • No breaking changes to existing APIs

Implementation Details

New Classes

HybridRetrievalStrategyConfig

Configuration interface for hybrid search:

interface HybridRetrievalStrategyConfig {
  rankWindowSize?: number;        // Default: 100
  rankConstant?: number;           // Default: 60
  textField?: string;              // Default: "text"
  includeSourceVectors?: boolean;  // For ES 9.2+
}

HybridRetrievalStrategy

Strategy class implementing hybrid search:

const strategy = new HybridRetrievalStrategy({
  rankWindowSize: 100,
  rankConstant: 60,
  textField: "text",
  includeSourceVectors: true  // For ES 9.2+
});

Modified Methods

similaritySearch()

  • Captures query text for hybrid search
  • Routes to hybrid or vector search based on strategy

similaritySearchVectorWithScore()

  • Routes to hybridSearchVectorWithScore() when strategy is present
  • Falls back to pure kNN search otherwise

New: hybridSearchVectorWithScore()

  • Private method implementing hybrid search
  • Uses Elasticsearch retriever API with RRF
  • Combines two retrievers:
    1. Standard retriever: BM25 full-text search
    2. kNN retriever: Vector similarity search

Usage

Basic Vector Search (No Change)

import { Client } from "@elastic/elasticsearch";
import { OpenAIEmbeddings } from "@langchain/openai";
import { ElasticVectorSearch } from "@langchain/community/vectorstores/elasticsearch";

const vectorStore = new ElasticVectorSearch(
  new OpenAIEmbeddings(),
  {
    client: new Client({ node: "http://localhost:9200" }),
    indexName: "my-index"
  }
);

const results = await vectorStore.similaritySearch("query", 5);

Hybrid Search (New)

import { HybridRetrievalStrategy } from "@langchain/community/vectorstores/elasticsearch";

const vectorStore = new ElasticVectorSearch(
  new OpenAIEmbeddings(),
  {
    client: new Client({ node: "http://localhost:9200" }),
    indexName: "my-index",
    strategy: new HybridRetrievalStrategy({
      rankWindowSize: 100,
      rankConstant: 60,
      textField: "text"
    })
  }
);

// Same API, but now uses hybrid search internally
const results = await vectorStore.similaritySearch(
  "how to prevent muscle soreness",
  5
);

Core Implementation

  • libs/langchain-community/src/vectorstores/elasticsearch.ts (+145 lines)
    • Added HybridRetrievalStrategyConfig interface
    • Added HybridRetrievalStrategy class
    • Updated ElasticClientArgs interface
    • Added hybridSearchVectorWithScore() method
    • Updated similaritySearch() and similaritySearchVectorWithScore()
    • Enhanced JSDoc documentation

@margaretjgu margaretjgu marked this pull request as draft November 12, 2025 21:58
@changeset-bot
Copy link

changeset-bot bot commented Nov 12, 2025

⚠️ No Changeset found

Latest commit: 9e26a6b

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions bot added community Issues related to `@langchain/community` pkg:@langchain/community labels Nov 12, 2025
@margaretjgu margaretjgu changed the title Enable Hybrid Search for Langchain.js Enable Elasticsearch Hybrid Search for Langchain.js Nov 12, 2025
return documentIds;
}

async similaritySearch(
Copy link
Author

@margaretjgu margaretjgu Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default value inferred from the VectorStore class that we extend from

async similaritySearch(
query: string,
k = 4,
filter: this["FilterType"] | undefined = undefined,
_callbacks: Callbacks | undefined = undefined // implement passing to embedQuery later
): Promise<DocumentInterface[]> {
const results = await this.similaritySearchVectorWithScore(
await this.embeddings.embedQuery(query),
k,
filter
);

@margaretjgu margaretjgu changed the title Enable Elasticsearch Hybrid Search for Langchain.js feat(): add elasticsearch hybrid search Nov 13, 2025
@margaretjgu margaretjgu marked this pull request as ready for review November 19, 2025 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community Issues related to `@langchain/community` examples pkg:@langchain/community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant