Skip to content

Make Elasticsearch use a linear retriever for hybrid search #4469

@jpountz

Description

@jpountz

I believe that the goal of the hybrid search benchmark is to compute top-10 hits for both the lexical and semantic search, before combining hits by summing up scores.

This is not exactly what the Elasticsearch _search call does, as it puts the vector query as a SHOULD clause of the bool query. So vector scores are computed first, and then summed up with lexical scores before top-10 hits are selected based on the summed up scores (which is harder on dynamic pruning). Switching to a linear retriever should hopefully fix this and make the comparison with Vespa a bit fairer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions