You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Heads up, fixes already in flight. PRs #161 (shell-only) and #162 (server-side, stacked on #161) are open and together drop the injection from ~9.8 KB to ~1.1 KB (−88%) with byte-compatible defaults, full tests, and benchmark numbers. Filing this issue as the standalone "why" writeup so there's a durable place to point at — didn't want to just walk in, bitch about token costs, and leave without bringing the patch.
TL;DR
On any project with a non-trivial observation history, the engram plugin's SessionStart hook at plugin/claude-code/scripts/session-start.sh injects roughly 9–10 KB of markdown into Claude Code's additionalContext on every new session — about 2,500 tokens burned before the user has typed a single character. ~80% of that is duplication or verbose formatting that the agent doesn't actually need at session start. This scales linearly with how often users open sessions (many times per day for power users), and it competes with the user's own work for context window space.
Measurement
Taken on a real project with ~200 observations across several months, personal scope, mix of session_summary / pattern / discovery / decision / bug / manual types. Hook stdout measured with wc -c on the actual bytes fed to Claude Code.
total SessionStart hook stdout : ~9800 B (~2500 tokens)
├── PROTOCOL heredoc : ~1800 B hardcoded in session-start.sh
└── /context payload : ~8000 B fetched from GET /context
├── Recent Sessions section : ~500 B
├── Recent Prompts section : ~300 B
└── Recent Observations : ~7200 B ← dominant cost
This is the injection cost per session start. Power users routinely keep 5–7 concurrent Claude Code runtimes alive (one per VSCode window, one per terminal claude invocation). Every /clear also re-triggers SessionStart. The compounded daily cost is substantial.
Where it comes from
1. The PROTOCOL heredoc (~1.8 KB) is a verbatim duplicate of skills/memory/SKILL.md
plugin/claude-code/scripts/session-start.sh contains a 35-line heredoc that exhaustively re-teaches the save/search/close protocol on every session — "PROACTIVE SAVE — do NOT wait for user to ask", the self-check checklist, the "SEARCH MEMORY when" list, the session-close reminder, etc.
The exact same content ships as a Claude Code skill in this same plugin at skills/memory/SKILL.md. Claude Code's skill system auto-loads skills on demand. So the agent learns the protocol twice: eagerly at session start (paid every time), and lazily via the skill when it actually needs the rules (paid only when needed).
Eager duplication pays the cost 100% of the time even though the lazy path covers all actual uses.
2. The /context endpoint inlines full observation bodies
- [type] **title**: <first 300 chars of obs.Content>
Because obs.Content is often multi-line markdown (session summaries are entire documents with headers, lists, and paragraphs), a single observation can easily spill across 5+ lines in the rendered output. Combined with MaxContextResults = 20, a busy project's /context response routinely hits 7–8 KB.
On a real session, the rendered output looks like this (one observation, abridged):
-[session_summary]**Session summary: ctodie**: ## Goal
Check for dangling sessions and runtime/process leaks across engram, Claude Code, docker, and dev servers — then clean them up.
## Discoveries-**7 concurrent Claude Code runtimes** were alive at session start (one per VSCode window / terminal claude). Each spawns its own `engram mcp --too...
Multiply that by 20 observations and the section is massive. And for the agent's actual needs at session start — "do I have prior context on this project, and what was I working on?" — titles alone are almost always sufficient. The body is only needed when the agent decides to drill into a specific memory, which is exactly what mem_search / mem_get_observation are for.
Why titles are enough at session start
The agent uses the injected context for exactly one thing: deciding whether to pull in more context before answering the user's first message. For that decision, it needs to know:
Is there prior work on this project? → counts of recent sessions (headers)
What topics/decisions exist? → observation titles
Anything from the last few hours? → timestamps + session summaries
Of those, only #2 and #3 touch the observations section, and both are answered by titles + timestamps. If the agent decides something in the list is relevant, it already has mem_search and mem_get_observation to pull the full body. Eagerly inlining 300 chars of body preview for all 20 observations is paying upfront for a fetch the agent usually doesn't need.
Why this matters
Direct token cost. ~2,500 tokens × (sessions opened per day) × (users) is a meaningful fraction of the total context window cost, and unlike work the agent actually does, this is pure overhead.
Context window scarcity. Every byte the hook injects is a byte not available for the user's actual code, docs, or conversation. On long sessions this bites.
Cold-start perception. The injection lands before the first user turn. On a slow pipe (or a session recovered from compaction) it's visible latency.
Scaling pressure. The cost grows with observation count. Power users with mature engram databases are punished proportionally to how much they've invested in the tool.
Proposed fixes (two PRs, stacked)
I have two PRs open that address the two sources independently:
perf(claude-code): shrink SessionStart hook injection from ~10 KB to ~2 KB #161 — perf(claude-code): shrink SessionStart hook injection from ~10 KB to ~2 KB
Shell-only, single file. Shrinks the PROTOCOL heredoc to a 9-line pointer that lists the tool names and directs the agent to skills/memory/SKILL.md for the rules. Adds an awk post-processor to the hook that flattens multi-line observation bodies onto single lines, caps per-bullet length at ENGRAM_CONTEXT_MAXLEN (default 140), and caps bullet count at ENGRAM_CONTEXT_LIMIT (default 8). Both env-tunable. Zero Go changes. Lets users capture most of the win via a plugin update alone, without rebuilding the binary.
Drop the PROTOCOL heredoc entirely and rely solely on the skill. Rejected — the heredoc's tool-list section (announcing which mem_* tools exist) is genuinely useful for the agent's first turn because it avoids a ToolSearch round-trip. The fix is to keep the tool list and drop the 30 lines of prescriptive "when to save" rules that the skill already covers.
Lower MaxContextResults globally. Rejected — that would affect every consumer of RecentObservations, not just the hook. The hook is the only caller that demands short output; other callers (CLI engram context, TUI, MCP mem_context tool) may legitimately want the full 20. The fix belongs at the render layer, not the data layer.
Cache the /context response. Rejected — doesn't address the underlying "we're rendering too much" problem, and cache invalidation on mem_save is non-trivial. Fixing the render is simpler and strictly better.
Move the PROTOCOL to a one-time UserPromptSubmit injection on the first turn instead of SessionStart. Considered but rejected as a separate concern — the UserPromptSubmit hook in plugin/claude-code/scripts/user-prompt-submit.sh already does a one-time ToolSearch injection for the mem_* tools, and conflating the two would couple the tool-loading concern with the protocol-teaching concern. Separate refactor for another day.
feat(context): add limit & compact query params to GET /context (byte-compatible) #162 preserves byte-compatible output when called with zero-value options (empirically verified). No database migration, no config format change. Rollback is git revert on the merge commit. Breakage would be caught by TestFormatContextWithOptions's exact-string-equality assertion against FormatContext, which fails loudly on any semantic drift.
Full test suite passes on both branches (go test ./... → 748 → 749 passed).
Environment
OS: Linux (WSL2 on Windows 11)
Go: 1.24.1
engram: 1.12.0-beta.1.0.20260407055054-49dc372fe2e4 (built from this PR branch)
Claude Code: desktop app, latest
Happy to split the issue or the PRs further if the scope feels too big to land as one change.
TL;DR
On any project with a non-trivial observation history, the engram plugin's
SessionStarthook atplugin/claude-code/scripts/session-start.shinjects roughly 9–10 KB of markdown into Claude Code'sadditionalContexton every new session — about 2,500 tokens burned before the user has typed a single character. ~80% of that is duplication or verbose formatting that the agent doesn't actually need at session start. This scales linearly with how often users open sessions (many times per day for power users), and it competes with the user's own work for context window space.Measurement
Taken on a real project with ~200 observations across several months,
personalscope, mix ofsession_summary/pattern/discovery/decision/bug/manualtypes. Hook stdout measured withwc -con the actual bytes fed to Claude Code.This is the injection cost per session start. Power users routinely keep 5–7 concurrent Claude Code runtimes alive (one per VSCode window, one per terminal
claudeinvocation). Every/clearalso re-triggersSessionStart. The compounded daily cost is substantial.Where it comes from
1. The PROTOCOL heredoc (~1.8 KB) is a verbatim duplicate of
skills/memory/SKILL.mdplugin/claude-code/scripts/session-start.shcontains a 35-line heredoc that exhaustively re-teaches the save/search/close protocol on every session — "PROACTIVE SAVE — do NOT wait for user to ask", the self-check checklist, the "SEARCH MEMORY when" list, the session-close reminder, etc.The exact same content ships as a Claude Code skill in this same plugin at
skills/memory/SKILL.md. Claude Code's skill system auto-loads skills on demand. So the agent learns the protocol twice: eagerly at session start (paid every time), and lazily via the skill when it actually needs the rules (paid only when needed).Eager duplication pays the cost 100% of the time even though the lazy path covers all actual uses.
2. The
/contextendpoint inlines full observation bodiesinternal/store/store.go'sFormatContextrenders each observation bullet as:Because
obs.Contentis often multi-line markdown (session summaries are entire documents with headers, lists, and paragraphs), a single observation can easily spill across 5+ lines in the rendered output. Combined withMaxContextResults = 20, a busy project's/contextresponse routinely hits 7–8 KB.On a real session, the rendered output looks like this (one observation, abridged):
Multiply that by 20 observations and the section is massive. And for the agent's actual needs at session start — "do I have prior context on this project, and what was I working on?" — titles alone are almost always sufficient. The body is only needed when the agent decides to drill into a specific memory, which is exactly what
mem_search/mem_get_observationare for.Why titles are enough at session start
The agent uses the injected context for exactly one thing: deciding whether to pull in more context before answering the user's first message. For that decision, it needs to know:
Of those, only #2 and #3 touch the observations section, and both are answered by titles + timestamps. If the agent decides something in the list is relevant, it already has
mem_searchandmem_get_observationto pull the full body. Eagerly inlining 300 chars of body preview for all 20 observations is paying upfront for a fetch the agent usually doesn't need.Why this matters
Proposed fixes (two PRs, stacked)
I have two PRs open that address the two sources independently:
perf(claude-code): shrink SessionStart hook injection from ~10 KB to ~2 KB #161 —
perf(claude-code): shrink SessionStart hook injection from ~10 KB to ~2 KBShell-only, single file. Shrinks the PROTOCOL heredoc to a 9-line pointer that lists the tool names and directs the agent to
skills/memory/SKILL.mdfor the rules. Adds anawkpost-processor to the hook that flattens multi-line observation bodies onto single lines, caps per-bullet length atENGRAM_CONTEXT_MAXLEN(default 140), and caps bullet count atENGRAM_CONTEXT_LIMIT(default 8). Both env-tunable. Zero Go changes. Lets users capture most of the win via a plugin update alone, without rebuilding the binary.feat(context): add limit & compact query params to GET /context (byte-compatible) #162 —
feat(context): add limit & compact query params to GET /context(stacked on perf(claude-code): shrink SessionStart hook injection from ~10 KB to ~2 KB #161)Server-side. Adds
limit=Nandcompact=1query params toGET /context, implemented via a newContextOptionsstruct andFormatContextWithOptionsmethod. Byte-compatible with the old behavior:FormatContext(project, scope)stays as a thin wrapper overFormatContextWithOptions(project, scope, ContextOptions{}), so all ~15 existing callers and test fixtures are untouched. Empirically verified:GET /contextwith no new params returns byte-for-byte identical output to the pre-change binary. The hook is updated to pass?limit=8&compact=1on matched-version deployments; the awk fallback from perf(claude-code): shrink SessionStart hook injection from ~10 KB to ~2 KB #161 stays in place for mixed-version deployments.Measured impact
/contextendpoint cost (server-side):/context?project=X(no new params)/context?project=X&limit=8&compact=1Alternatives considered
Drop the PROTOCOL heredoc entirely and rely solely on the skill. Rejected — the heredoc's tool-list section (announcing which
mem_*tools exist) is genuinely useful for the agent's first turn because it avoids a ToolSearch round-trip. The fix is to keep the tool list and drop the 30 lines of prescriptive "when to save" rules that the skill already covers.Lower
MaxContextResultsglobally. Rejected — that would affect every consumer ofRecentObservations, not just the hook. The hook is the only caller that demands short output; other callers (CLIengram context, TUI, MCPmem_contexttool) may legitimately want the full 20. The fix belongs at the render layer, not the data layer.Cache the
/contextresponse. Rejected — doesn't address the underlying "we're rendering too much" problem, and cache invalidation onmem_saveis non-trivial. Fixing the render is simpler and strictly better.Move the PROTOCOL to a one-time
UserPromptSubmitinjection on the first turn instead ofSessionStart. Considered but rejected as a separate concern — theUserPromptSubmithook inplugin/claude-code/scripts/user-prompt-submit.shalready does a one-time ToolSearch injection for themem_*tools, and conflating the two would couple the tool-loading concern with the protocol-teaching concern. Separate refactor for another day.Risk & rollback
Both PRs are low-risk:
git reverton a single shell file.git reverton the merge commit. Breakage would be caught byTestFormatContextWithOptions's exact-string-equality assertion againstFormatContext, which fails loudly on any semantic drift.Full test suite passes on both branches (
go test ./...→ 748 → 749 passed).Environment
1.12.0-beta.1.0.20260407055054-49dc372fe2e4(built from this PR branch)Happy to split the issue or the PRs further if the scope feels too big to land as one change.
🤖 Generated with Claude Code