feat(memory): ADR-147 entity arm + signal provenance in hybridSearch by ruvnet · Pull Request #2327 · ruvnet/ruflo

ruvnet · 2026-06-08T17:10:27Z

Closes part of #2317 and references #2324.

What this lands

Implements the actual gap from the ADR-147 multi-signal retrieval proposal: the third RRF arm (entity matching) + per-result signal provenance.

The ADR's stated P1 ("wire FTS5 + RRF fusion") turned out to be already shipped — controller-registry.ts:713 already runs semanticSearch() + searchKeyword() in parallel, fuses via applyRRF(k=60), diversifies via applyMMR(λ=0.7). The dream-cycle author missed this. The actual gap was the entity arm (ADR P2) and the missing signals field.

Changes

File	Lines	What
`src/entity-tagger.ts` (new)	+71	Regex extractor for emails, URLs, file paths (POSIX + Windows), quoted phrases, proper-noun 2-grams
`src/entity-tagger.test.ts` (new)	+94	12 unit tests pinning conservatism — false negatives OK, false positives bad
`src/controller-registry.ts`	+63 / −21	`hybridSearch` controller gains the entity arm (per-entity `searchKeyword` in parallel) + builds `signals` set membership before RRF
`src/graceful-retrieval.test.ts`	+55 / −0	Provenance assertion on existing test + new needle-in-haystack test (Alice Smith in 30 generic auth entries)

Design notes

Conservative tagger. False negatives are fine (dense + sparse arms cover); false positives would dilute RRF. Tests pin generic prose → empty, single capital words → empty, and/or → empty. Key bug found and fixed: the original quote regex paired the closing " of one phrase with the opening " of the next (the "a" over "b" capturing over problem) — fixed with (?<!\w)..(?!\w) lookarounds.
Empty arm bypass. If extractEntities(query) returns nothing, the entity arm is dropped from the RRF input entirely rather than passed as []. Avoids diluting the fusion when there are no entities.
Provenance via pre-fusion set membership. Build denseIds, sparseIds, entityIds from the candidates BEFORE RRF, then stamp signals[] on each fused output by checking which sets contain the candidate id. Doesn't require modifying applyRRF.

Validation

npx vitest run src/entity-tagger.test.ts src/graceful-retrieval.test.ts — 16/16 pass
Full memory suite — 416/420 pass. The 4 failures are pre-existing Windows-environment issues in unrelated files (agent-memory-scope.test.ts path separators, benchmark.test.ts perf budget). My branch doesn't touch any of those files.

What this defers

Entity index in SQLite (ADR P2 stretch goal) — current implementation runs per-entity searchKeyword calls. Fine for typical query entity counts (1–3) but unbounded if a query mentions 20 entities. A dedicated entity index would cap this; deferred to a follow-up.
Async writes by default (ADR P3) — orthogonal concern; the existing consolidator already handles HNSW background rebuild.
LoCoMo benchmark publication (ADR P4) — requires harness wiring + dataset access; punted to a separate workstream.

Test plan

entity-tagger.test.ts 12/12 pass
graceful-retrieval.test.ts 4/4 pass (2 existing + 2 new ADR-147 ones)
npm run build clean (no TS errors)
Full memory suite has no regression from this branch

🤖 Generated with RuFlo

…2317) Adds the third signal that the multi-signal retrieval ADR (#2317) called out as the actual gap, plus per-result provenance. The ADR's stated P1 — "wire FTS5 + RRF fusion" — turned out to be already shipped: `controller-registry.ts:713` already runs `semanticSearch()` + `searchKeyword()` in parallel, fuses via `applyRRF(k=60)`, diversifies via `applyMMR(λ=0.7)`. What was actually missing is the entity arm (ADR P2) and the `signals` field on each fused result. This change: 1. `entity-tagger.ts` — regex-based extractor for emails, URLs, file paths (POSIX + Windows), quoted phrases, and proper-noun 2-grams. Deliberately conservative: false negatives are fine (dense + sparse cover the rest), false positives would dilute the RRF score. `(?<!\w)..(?!\w)` lookarounds on the quote patterns stop the regex from pairing a closing quote of one phrase with the opening of the next (the classic `"a" over "b"` bug). 12 unit tests. 2. `controller-registry.ts hybridSearch` — extracts entities from the query; if any, runs `searchKeyword(entity, fanOut/n)` per entity in parallel, flattens, and adds as a third RRF arm. Empty entity set bypasses the arm entirely so it doesn't dilute fusion. 3. `signals: ('vector'|'bm25'|'entity')[]` on every returned result. Computed by candidate-id set membership in each arm's pre-fusion result. Lets callers debug which arms surfaced an entry without re-running the search. 4. `graceful-retrieval.test.ts` — extends the existing hybridSearch test with provenance assertion + a needle-in-haystack test (30 generic "authentication" entries + 1 "Alice Smith"; query "Alice Smith authentication" surfaces the Alice entry with `signals.includes('entity')`). Memory test suite: 416/420 pass. The 4 failures are pre-existing Windows-environment issues in unrelated files (agent-memory-scope path separator + benchmark.test.ts perf budget). Co-Authored-By: RuFlo <ruv@ruv.net>

#2327) @claude-flow/memory 3.0.0-alpha.19 → 3.0.0-alpha.20. Adds the entity arm to hybridSearch alongside the existing dense + sparse RRF fusion, plus per-result signals: ('vector'|'bm25'|'entity')[] provenance. End-to-end capability smoke against built dist confirmed: Alice needle in 31-doc corpus ranks #1 with all three signals; runner-up has only vector+bm25 — RRF score gap of ~47%. @claude-flow/cli, claude-flow, ruflo 3.10.38 → 3.10.39. CLI also pins @claude-flow/memory to ^3.0.0-alpha.20 so the wrapper users pick up the entity arm automatically. All four packages published with latest+alpha+v3alpha aligned. Lockfile regen included (lesson from #2311 — bumping a workspace dep without regenerating v3/pnpm-lock.yaml breaks frozen-lockfile CI). Co-Authored-By: RuFlo <ruv@ruv.net>

ruvnet merged commit b099b70 into main Jun 8, 2026
97 of 98 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(memory): ADR-147 entity arm + signal provenance in hybridSearch#2327

feat(memory): ADR-147 entity arm + signal provenance in hybridSearch#2327
ruvnet merged 1 commit into
mainfrom
feat/adr-147-entity-signal-2317

ruvnet commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ruvnet commented Jun 8, 2026

What this lands

Changes

Design notes

Validation

What this defers

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant