Skip to content

feat: add semantic search over EIPs#32

Open
qu0b wants to merge 1 commit intoqu0b/cli-and-skillsfrom
qu0b/eip-search
Open

feat: add semantic search over EIPs#32
qu0b wants to merge 1 commit intoqu0b/cli-and-skillsfrom
qu0b/eip-search

Conversation

@qu0b
Copy link
Member

@qu0b qu0b commented Mar 7, 2026

Summary

  • Adds semantic search over Ethereum Improvement Proposals (EIPs) via a new search_eips MCP tool and ep search eips CLI command
  • Fetches all ~900 EIPs from the ethereum/EIPs GitHub repo via tarball API (single request), parses YAML frontmatter + full markdown body
  • Builds a chunked embedding index using the existing MiniLM-L6-v2 model with per-chunk vector caching (only re-embeds changed content)
  • Uses hybrid scoring: semantic similarity + exact text match boost for queries > 4 chars, so domain-specific terms like "63/64 rule" surface relevant EIPs even when the embedding model alone misses them
  • Adds BuildSearchOnly app mode so CLI search commands don't require proxy/sandbox

New packages

  • pkg/types/eip.goEIP struct and EIPVector cache type
  • pkg/eips/ — fetcher (GitHub tarball), parser (YAML frontmatter), registry (cache lifecycle + commit SHA tracking)
  • pkg/resource/eip_index.go — chunking, stripping, embedding, hybrid search
  • pkg/tool/search_eips.go — MCP tool with status/category/type filters

Test plan

  • pkg/eips/parser_test.go — unit tests for YAML frontmatter parsing (5 cases)
  • pkg/eips/integration_test.go — fetch from GitHub, cache round-trip, vector persistence
  • pkg/resource/eip_index_test.go — full search integration test, vector reuse test
  • make lint — 0 issues
  • make test — all passing
  • Manual CLI testing: ep search eips "63/64 rule", "account abstraction", "blob transactions", "proof of stake"

Fetch all EIPs from the ethereum/EIPs GitHub repo via tarball download,
parse YAML frontmatter + full markdown body, and build a semantic search
index using the existing MiniLM-L6-v2 embedding model.

Key features:
- Bulk fetch via GitHub tarball API (single request for all ~900 EIPs)
- Local cache with commit SHA tracking for incremental updates
- Per-chunk vector caching with SHA256 text hashes (skip re-embedding unchanged content)
- Content chunking at 600-char paragraph boundaries for the 512-token model
- Strips code blocks, URLs, tables, and dense hex before embedding
- Hybrid scoring: semantic similarity + exact text match boost (queries > 4 chars)
- Multi-chunk deduplication in search results (best score per EIP)
- MCP tool (search_eips) and CLI command (ep search eips)
- Filter by status, category, and type
- Non-fatal initialization (gracefully disabled if GitHub unreachable)
- BuildSearchOnly app mode for CLI search (no proxy/sandbox needed)
@qu0b qu0b changed the base branch from master to qu0b/cli-and-skills March 7, 2026 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant