perf: SessionStart hook injects ~10 KB (~2.5k tokens) into every Claude Code session

> **Heads up, fixes already in flight.** PRs #161 (shell-only) and #162 (server-side, stacked on #161) are open and together drop the injection from ~9.8 KB to ~1.1 KB (−88%) with byte-compatible defaults, full tests, and benchmark numbers. Filing this issue as the standalone \"why\" writeup so there's a durable place to point at — didn't want to just walk in, bitch about token costs, and leave without bringing the patch.

## TL;DR

On any project with a non-trivial observation history, the engram plugin's `SessionStart` hook at [`plugin/claude-code/scripts/session-start.sh`](../blob/main/plugin/claude-code/scripts/session-start.sh) injects roughly **9–10 KB** of markdown into Claude Code's `additionalContext` on every new session — about **2,500 tokens burned before the user has typed a single character**. ~80% of that is duplication or verbose formatting that the agent doesn't actually need at session start. This scales linearly with how often users open sessions (many times per day for power users), and it competes with the user's own work for context window space.

## Measurement

Taken on a real project with ~200 observations across several months, `personal` scope, mix of `session_summary` / `pattern` / `discovery` / `decision` / `bug` / `manual` types. Hook stdout measured with `wc -c` on the actual bytes fed to Claude Code.

```
total SessionStart hook stdout : ~9800 B    (~2500 tokens)
├── PROTOCOL heredoc            : ~1800 B   hardcoded in session-start.sh
└── /context payload            : ~8000 B   fetched from GET /context
    ├── Recent Sessions section : ~500 B
    ├── Recent Prompts section  : ~300 B
    └── Recent Observations     : ~7200 B   ← dominant cost
```

This is the injection cost **per session start**. Power users routinely keep 5–7 concurrent Claude Code runtimes alive (one per VSCode window, one per terminal `claude` invocation). Every `/clear` also re-triggers `SessionStart`. The compounded daily cost is substantial.

## Where it comes from

### 1. The PROTOCOL heredoc (~1.8 KB) is a verbatim duplicate of `skills/memory/SKILL.md`

[`plugin/claude-code/scripts/session-start.sh`](../blob/main/plugin/claude-code/scripts/session-start.sh) contains a 35-line heredoc that exhaustively re-teaches the save/search/close protocol on every session — \"PROACTIVE SAVE — do NOT wait for user to ask\", the self-check checklist, the \"SEARCH MEMORY when\" list, the session-close reminder, etc.

The **exact same content** ships as a Claude Code skill in this same plugin at [`skills/memory/SKILL.md`](../blob/main/plugin/claude-code/skills/memory/SKILL.md). Claude Code's skill system auto-loads skills on demand. So the agent learns the protocol twice: eagerly at session start (paid every time), and lazily via the skill when it actually needs the rules (paid only when needed).

Eager duplication pays the cost 100% of the time even though the lazy path covers all actual uses.

### 2. The `/context` endpoint inlines full observation bodies

[`internal/store/store.go`](../blob/main/internal/store/store.go)'s `FormatContext` renders each observation bullet as:

```
- [type] **title**: <first 300 chars of obs.Content>
```

Because `obs.Content` is often multi-line markdown (session summaries are entire documents with headers, lists, and paragraphs), a single observation can easily spill across 5+ lines in the rendered output. Combined with `MaxContextResults = 20`, a busy project's `/context` response routinely hits 7–8 KB.

On a real session, the rendered output looks like this (one observation, abridged):

```markdown
- [session_summary] **Session summary: ctodie**: ## Goal
Check for dangling sessions and runtime/process leaks across engram, Claude Code, docker, and dev servers — then clean them up.

## Discoveries
- **7 concurrent Claude Code runtimes** were alive at session start (one per VSCode window / terminal claude). Each spawns its own `engram mcp --too...
```

Multiply that by 20 observations and the section is massive. And for the agent's actual needs at session start — \"do I have prior context on this project, and what was I working on?\" — **titles alone are almost always sufficient**. The body is only needed when the agent decides to drill into a specific memory, which is exactly what `mem_search` / `mem_get_observation` are for.

## Why titles are enough at session start

The agent uses the injected context for exactly one thing: deciding whether to pull in more context before answering the user's first message. For that decision, it needs to know:

1. **Is there prior work on this project?** → counts of recent sessions (headers)
2. **What topics/decisions exist?** → observation titles
3. **Anything from the last few hours?** → timestamps + session summaries

Of those, only #2 and #3 touch the observations section, and both are answered by titles + timestamps. If the agent decides something in the list is relevant, it already has `mem_search` and `mem_get_observation` to pull the full body. Eagerly inlining 300 chars of body preview for all 20 observations is paying upfront for a fetch the agent usually doesn't need.

## Why this matters

- **Direct token cost.** ~2,500 tokens × (sessions opened per day) × (users) is a meaningful fraction of the total context window cost, and unlike work the agent actually does, this is pure overhead.
- **Context window scarcity.** Every byte the hook injects is a byte not available for the user's actual code, docs, or conversation. On long sessions this bites.
- **Cold-start perception.** The injection lands before the first user turn. On a slow pipe (or a session recovered from compaction) it's visible latency.
- **Scaling pressure.** The cost grows with observation count. Power users with mature engram databases are punished proportionally to how much they've invested in the tool.

## Proposed fixes (two PRs, stacked)

I have two PRs open that address the two sources independently:

- **#161** — `perf(claude-code): shrink SessionStart hook injection from ~10 KB to ~2 KB`
  Shell-only, single file. Shrinks the PROTOCOL heredoc to a 9-line pointer that lists the tool names and directs the agent to `skills/memory/SKILL.md` for the rules. Adds an `awk` post-processor to the hook that flattens multi-line observation bodies onto single lines, caps per-bullet length at `ENGRAM_CONTEXT_MAXLEN` (default 140), and caps bullet count at `ENGRAM_CONTEXT_LIMIT` (default 8). Both env-tunable. **Zero Go changes.** Lets users capture most of the win via a plugin update alone, without rebuilding the binary.

- **#162** — `feat(context): add limit & compact query params to GET /context` *(stacked on #161)*
  Server-side. Adds `limit=N` and `compact=1` query params to `GET /context`, implemented via a new `ContextOptions` struct and `FormatContextWithOptions` method. **Byte-compatible** with the old behavior: `FormatContext(project, scope)` stays as a thin wrapper over `FormatContextWithOptions(project, scope, ContextOptions{})`, so all ~15 existing callers and test fixtures are untouched. Empirically verified: `GET /context` with no new params returns byte-for-byte identical output to the pre-change binary. The hook is updated to pass `?limit=8&compact=1` on matched-version deployments; the awk fallback from #161 stays in place for mixed-version deployments.

## Measured impact

| stage | SessionStart injection |
|---|---:|
| today | ~9800 B |
| after #161 | ~1900 B (−81%) |
| after #161 + #162 | **~1151 B (−88%)** |

`/context` endpoint cost (server-side):

| request | old binary | new binary |
|---|---:|---:|
| `/context?project=X` (no new params) | 7949 B | **7949 B** (byte-identical) |
| `/context?project=X&limit=8&compact=1` | n/a | **728 B** (−91%) |

## Alternatives considered

1. **Drop the PROTOCOL heredoc entirely and rely solely on the skill.** Rejected — the heredoc's tool-list section (announcing which `mem_*` tools exist) is genuinely useful for the agent's first turn because it avoids a ToolSearch round-trip. The fix is to keep the tool list and drop the 30 lines of prescriptive \"when to save\" rules that the skill already covers.

2. **Lower `MaxContextResults` globally.** Rejected — that would affect every consumer of `RecentObservations`, not just the hook. The hook is the only caller that demands short output; other callers (CLI `engram context`, TUI, MCP `mem_context` tool) may legitimately want the full 20. The fix belongs at the render layer, not the data layer.

3. **Cache the `/context` response.** Rejected — doesn't address the underlying \"we're rendering too much\" problem, and cache invalidation on `mem_save` is non-trivial. Fixing the render is simpler and strictly better.

4. **Move the PROTOCOL to a one-time `UserPromptSubmit` injection on the first turn instead of `SessionStart`.** Considered but rejected as a separate concern — the `UserPromptSubmit` hook in [`plugin/claude-code/scripts/user-prompt-submit.sh`](../blob/main/plugin/claude-code/scripts/user-prompt-submit.sh) already does a one-time ToolSearch injection for the `mem_*` tools, and conflating the two would couple the tool-loading concern with the protocol-teaching concern. Separate refactor for another day.

## Risk & rollback

Both PRs are low-risk:

- **#161** has zero Go changes. Server contract unchanged. Rollback is `git revert` on a single shell file.
- **#162** preserves byte-compatible output when called with zero-value options (empirically verified). No database migration, no config format change. Rollback is `git revert` on the merge commit. Breakage would be caught by `TestFormatContextWithOptions`'s exact-string-equality assertion against `FormatContext`, which fails loudly on any semantic drift.

Full test suite passes on both branches (`go test ./...` → 748 → 749 passed).

## Environment

- OS: Linux (WSL2 on Windows 11)
- Go: 1.24.1
- engram: `1.12.0-beta.1.0.20260407055054-49dc372fe2e4` (built from this PR branch)
- Claude Code: desktop app, latest

Happy to split the issue or the PRs further if the scope feels too big to land as one change.

🤖 Generated with [Claude Code](https://claude.com/claude-code)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: SessionStart hook injects ~10 KB (~2.5k tokens) into every Claude Code session #163

TL;DR

Measurement

Where it comes from

1. The PROTOCOL heredoc (~1.8 KB) is a verbatim duplicate of `skills/memory/SKILL.md`

2. The `/context` endpoint inlines full observation bodies

Why titles are enough at session start

Why this matters

Proposed fixes (two PRs, stacked)

Measured impact

Alternatives considered

Risk & rollback

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

stage	SessionStart injection
today	~9800 B
after #161	~1900 B (−81%)
after #161 + #162	~1151 B (−88%)

request	old binary	new binary
`/context?project=X` (no new params)	7949 B	7949 B (byte-identical)
`/context?project=X&limit=8&compact=1`	n/a	728 B (−91%)

perf: SessionStart hook injects ~10 KB (~2.5k tokens) into every Claude Code session #163

Description

TL;DR

Measurement

Where it comes from

1. The PROTOCOL heredoc (~1.8 KB) is a verbatim duplicate of skills/memory/SKILL.md

2. The /context endpoint inlines full observation bodies

Why titles are enough at session start

Why this matters

Proposed fixes (two PRs, stacked)

Measured impact

Alternatives considered

Risk & rollback

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. The PROTOCOL heredoc (~1.8 KB) is a verbatim duplicate of `skills/memory/SKILL.md`

2. The `/context` endpoint inlines full observation bodies