Skip to content

fix: align gbrain extract --dry-run and doctor guidance with actual commands#397

Closed
vinsew wants to merge 1 commit into
garrytan:masterfrom
vinsew:fix/extract-dry-run-doctor-guidance
Closed

fix: align gbrain extract --dry-run and doctor guidance with actual commands#397
vinsew wants to merge 1 commit into
garrytan:masterfrom
vinsew:fix/extract-dry-run-doctor-guidance

Conversation

@vinsew
Copy link
Copy Markdown
Contributor

@vinsew vinsew commented Apr 24, 2026

Summary

Two small alignment fixes discovered while upgrading an existing brain from v0.14.2 to v0.20.4:

  1. gbrain doctor still tells users to run the retired command pair gbrain link-extract && gbrain timeline-extract — the graph_coverage warning message now points at the current canonical command gbrain extract all --source db.

  2. gbrain extract --dry-run over-reports by counting every extracted candidate as a net-new row, even when the DB would reject it via ON CONFLICT DO NOTHING. The dry-run now caches existing outgoing links / timeline rows per source slug and filters candidates against that cache, so dry-run row counts match what a real run would actually insert.

The link/timeline dedup keys used in dry-run now also carry origin_page_id / origin_slug so frontmatter-derived edges from different origins don't collapse.

Test plan

  • New test/extract-db.test.ts case: dry-run output after a prior real-run reports zero net-new links (before this change it reported 1-to-1 with candidates).
  • test/doctor.test.ts regression guard asserts doctor.ts source contains gbrain extract all --source db and does NOT contain gbrain link-extract && gbrain timeline-extract.
  • bun test passes the new cases on my setup.

🤖 Generated with Claude Code

@vinsew vinsew force-pushed the fix/extract-dry-run-doctor-guidance branch from 7d1389a to 33c02bf Compare April 27, 2026 09:17
@vinsew
Copy link
Copy Markdown
Contributor Author

vinsew commented May 12, 2026

Closing — superseded by #914.

This PR can't cherry-pick onto current master. v0.32.8's multi-source threading (#860) reshaped both extractLinksFromDB and extractTimelineFromDB:

  • Candidate dedup keys now carry source ids (6 segments: from_source_id::from_slug::to_source_id::to_slug::link_type::link_source).
  • engine.getLinks() does not return f.source_id / t.source_id, and the Link type has no source-id fields, so my original 5-segment key compare would false-positive on cross-source rows.

#914 implements the same intent on the multi-source shape: per-extractor inline SQL that returns source ids alongside the link / timeline rows, cached per from-page, byte-for-byte key parity with the candidate side. 4 new tests in test/extract-db.test.ts pin both halves of the contract (zero net new after a real run, AND newly-added candidates still surface).

Thanks for the patience on this one.

@vinsew vinsew closed this May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant