feat(memory): ADR-147 entity arm + signal provenance in hybridSearch#2327
Merged
Conversation
…2317) Adds the third signal that the multi-signal retrieval ADR (#2317) called out as the actual gap, plus per-result provenance. The ADR's stated P1 — "wire FTS5 + RRF fusion" — turned out to be already shipped: `controller-registry.ts:713` already runs `semanticSearch()` + `searchKeyword()` in parallel, fuses via `applyRRF(k=60)`, diversifies via `applyMMR(λ=0.7)`. What was actually missing is the entity arm (ADR P2) and the `signals` field on each fused result. This change: 1. `entity-tagger.ts` — regex-based extractor for emails, URLs, file paths (POSIX + Windows), quoted phrases, and proper-noun 2-grams. Deliberately conservative: false negatives are fine (dense + sparse cover the rest), false positives would dilute the RRF score. `(?<!\w)..(?!\w)` lookarounds on the quote patterns stop the regex from pairing a closing quote of one phrase with the opening of the next (the classic `"a" over "b"` bug). 12 unit tests. 2. `controller-registry.ts hybridSearch` — extracts entities from the query; if any, runs `searchKeyword(entity, fanOut/n)` per entity in parallel, flattens, and adds as a third RRF arm. Empty entity set bypasses the arm entirely so it doesn't dilute fusion. 3. `signals: ('vector'|'bm25'|'entity')[]` on every returned result. Computed by candidate-id set membership in each arm's pre-fusion result. Lets callers debug which arms surfaced an entry without re-running the search. 4. `graceful-retrieval.test.ts` — extends the existing hybridSearch test with provenance assertion + a needle-in-haystack test (30 generic "authentication" entries + 1 "Alice Smith"; query "Alice Smith authentication" surfaces the Alice entry with `signals.includes('entity')`). Memory test suite: 416/420 pass. The 4 failures are pre-existing Windows-environment issues in unrelated files (agent-memory-scope path separator + benchmark.test.ts perf budget). Co-Authored-By: RuFlo <ruv@ruv.net>
ruvnet
added a commit
that referenced
this pull request
Jun 8, 2026
#2327) @claude-flow/memory 3.0.0-alpha.19 → 3.0.0-alpha.20. Adds the entity arm to hybridSearch alongside the existing dense + sparse RRF fusion, plus per-result signals: ('vector'|'bm25'|'entity')[] provenance. End-to-end capability smoke against built dist confirmed: Alice needle in 31-doc corpus ranks #1 with all three signals; runner-up has only vector+bm25 — RRF score gap of ~47%. @claude-flow/cli, claude-flow, ruflo 3.10.38 → 3.10.39. CLI also pins @claude-flow/memory to ^3.0.0-alpha.20 so the wrapper users pick up the entity arm automatically. All four packages published with latest+alpha+v3alpha aligned. Lockfile regen included (lesson from #2311 — bumping a workspace dep without regenerating v3/pnpm-lock.yaml breaks frozen-lockfile CI). Co-Authored-By: RuFlo <ruv@ruv.net>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes part of #2317 and references #2324.
What this lands
Implements the actual gap from the ADR-147 multi-signal retrieval proposal: the third RRF arm (entity matching) + per-result signal provenance.
The ADR's stated P1 ("wire FTS5 + RRF fusion") turned out to be already shipped —
controller-registry.ts:713already runssemanticSearch()+searchKeyword()in parallel, fuses viaapplyRRF(k=60), diversifies viaapplyMMR(λ=0.7). The dream-cycle author missed this. The actual gap was the entity arm (ADR P2) and the missingsignalsfield.Changes
src/entity-tagger.ts(new)src/entity-tagger.test.ts(new)src/controller-registry.tshybridSearchcontroller gains the entity arm (per-entitysearchKeywordin parallel) + buildssignalsset membership before RRFsrc/graceful-retrieval.test.tsDesign notes
and/or→ empty. Key bug found and fixed: the original quote regex paired the closing"of one phrase with the opening"of the next (the"a" over "b"capturingoverproblem) — fixed with(?<!\w)..(?!\w)lookarounds.extractEntities(query)returns nothing, the entity arm is dropped from the RRF input entirely rather than passed as[]. Avoids diluting the fusion when there are no entities.denseIds,sparseIds,entityIdsfrom the candidates BEFORE RRF, then stampsignals[]on each fused output by checking which sets contain the candidate id. Doesn't require modifyingapplyRRF.Validation
npx vitest run src/entity-tagger.test.ts src/graceful-retrieval.test.ts— 16/16 passagent-memory-scope.test.tspath separators,benchmark.test.tsperf budget). My branch doesn't touch any of those files.What this defers
searchKeywordcalls. Fine for typical query entity counts (1–3) but unbounded if a query mentions 20 entities. A dedicated entity index would cap this; deferred to a follow-up.Test plan
entity-tagger.test.ts12/12 passgraceful-retrieval.test.ts4/4 pass (2 existing + 2 new ADR-147 ones)npm run buildclean (no TS errors)🤖 Generated with RuFlo