Compaction and Memory

contractVersion: 1.1.0

How a session handles conversation history that exceeds the model's context window: summarization, token budgeting, the non-determinism implication, and why deterministic workflows need to design around it.

The problem

Every LLM has a finite context window. A long session — many turns, many tool calls — accumulates history that eventually exceeds it. Compaction is the process of summarizing old history into a smaller representation so newer content fits.

Compaction is a core subsystem. Extensions cannot replace it, but can influence it (SM may force compaction earlier; Context Providers may contribute to the compaction summary).

When compaction fires

flowchart TD
    Assembly[context assembly] --> Check{fits in<br/>model window?}
    Check -->|yes| Send[send request]
    Check -->|no, near limit| Warn[emit threshold event]
    Warn --> Compact[compact history]
    Compact --> Retry[rebuild request]
    Retry --> Send

Triggers:

Threshold: the request token count crosses a configured fraction of the model window (default 80%).
Explicit: the user runs /compact.
Post-failure: a ContextOverflow from the provider forces compaction as a recovery path.

What gets compacted

Section	Compactable?
System prompt	No. It is considered load-bearing.
Recent N turns	No. The window of "fresh" history is preserved.
Older turns	Yes. Summarized into a "compacted history" block.
Tool outputs	Yes — often more compactable than messages.
Reasoning blocks (Anthropic thinking blocks present when `sendReasoning: true`; see Provider Params § Reasoning persistence)	Yes, with explicit retention rules — see Reasoning blocks as a compactable content type below.
Compacted history (prior)	Yes — further compacted as needed.
Context Provider contributions	No. Re-computed each turn.

The boundary between "recent" and "older" is the session's compaction threshold — configurable, default is something like "last 4 turns."

Reasoning blocks as a compactable content type {#reasoning-blocks-as-a-compactable-content-type}

When Anthropic's sendReasoning: true is in effect (the v1 default), assistant messages carry a content array that includes thinking blocks alongside text and tool-call structure. See Provider Params for the canonical policy. Reasoning blocks are durable conversation content, but they have different fidelity, safety, and provider-portability properties than assistant text. Compaction handles them with explicit retention rules:

Retention rule	Default applicability	Behavior
Verbatim	Most-recent compaction-pass turn(s) within the protected window	Reasoning blocks remain unchanged in the persisted message.
Summarized as ordinary text	Older turns, when the compaction provider produces a single summary block per turn	Reasoning text contributes to the summary's input prompt; the output replaces the original message as ordinary text — the summary is not repackaged as a provider-native thinking block.
Dropped	Tool-call reasoning context that the SM declares disposable; user-configured drop policy on long-running sessions	Reasoning is removed from the persisted message; assistant text and tool calls remain.
Wire-boundary clear (Anthropic `contextManagement.clear_thinking`)	When the user configures Anthropic's provider-native context management	Provider-native thinking-clear edits the request at the wire boundary before it leaves the adapter; the manifest is unaffected. Core compaction operates on the manifest and is the canonical persistence-shaping mechanism — see Core compaction vs Anthropic `contextManagement` below.

Compaction never fabricates provider-native thinking blocks from summaries. A summarized older turn becomes ordinary text. This is a normative invariant — emitting a synthesized thinking block downstream would mislead the model about what it actually thought and could break re-roll determinism on Anthropic.

Default when sendReasoning: true: reasoning blocks within the recent-turn protected window are kept verbatim; older reasoning is summarized into ordinary text. Users may opt into a stricter "drop reasoning beyond the protected window" policy per the configurable compaction mode (see Designing around compaction (for deterministic workflows)).

When sendReasoning: false, assistant messages carry no thinking blocks and these rules do not apply.

Core compaction vs Anthropic `contextManagement`

Anthropic's adapter exposes a contextManagement field on defaultParams for provider-native context editing — automatic compaction at the wire boundary, thinking clearing, and similar surfaces. The two surfaces operate on different artifacts and at different points in the pipeline:

Core compaction (this page) operates on the manifest. It rewrites persisted history; the rewrite is durable and feeds every subsequent turn's request. It is the canonical session-state-shaping mechanism in v1.
Anthropic contextManagement operates at the wire boundary. It edits the request body the adapter is about to send to Anthropic, on every turn. Its effects are visible only to that wire request; the manifest is unchanged.

Precedence: core compaction is canonical. When both surfaces are configured on the same provider entry, the manifest reflects core compaction's view. Anthropic contextManagement still runs at the wire boundary on every turn (the adapter does not suppress it), but it never substitutes for core compaction on the persistence side. Users should think of contextManagement as an in-flight request edit, not a manifest mutation.

DoubleCompactionConfigured validation warning. Validation Pipeline emits this warning at session start when core compaction (enabled by default) is paired with Anthropic contextManagement on the same provider entry. The session continues; the warning surfaces the duplication so users can decide whether to disable one of the surfaces or accept both running side by side.

How compaction runs

sequenceDiagram
    autonumber
    participant Core
    participant Provider as "compaction provider<br/>(bundled: LLM-based)"
    participant LLM
    Core->>Provider: request summary<br/>(older history + prior compaction)
    Provider->>LLM: summarization call
    LLM-->>Provider: summary text
    Provider-->>Core: compacted block
    Core->>Core: replace older history<br/>with compacted block
    Core->>Core: re-assemble next request

The bundled compaction provider uses the session's active LLM provider with a compaction-specific prompt (from the Prompt Registry). Compaction itself is core — see Extensibility Boundary. Extensions influence it (SM may force or defer; Context Providers may contribute the compaction prompt or ambient context) but cannot replace the subsystem.

Non-determinism, loudly

Compaction is non-deterministic when it uses an LLM to produce summaries:

The same history compacted twice may yield different summaries.
Downstream decisions (SM transitions, hook guards, tool choices) reading compacted content inherit that non-determinism.
Replaying a session that crossed a compaction boundary is not guaranteed to produce the same future turns.

This is called out loudly because:

Deterministic workflows are a primary use case for stud-cli (see the user's state-machine motivation).
A silent LLM-summarization step that flips determinism on its head would be a rug-pull.

Designing around compaction (for deterministic workflows)

Two strategies:

Size the context to avoid compaction. Keep turns small; use Context Providers that load pertinent info per turn rather than accumulating history.
Configure a deterministic compaction mode on the core compaction subsystem (e.g., "drop tool outputs older than N turns; keep everything else verbatim until the window"). Core may ship alternative compaction modes under a configuration key; the subsystem remains core. This trades fidelity for predictability.

Either strategy is compatible with v1. The wiki does not pick one; workflows pick.

Memory vs compaction

Memory in v1 is conversation history. stud-cli does not ship a long-term, cross-session "memory" store. Long-term memory (facts the assistant should remember across sessions) is a workflow choice:

A Context Provider can read project files (e.g., CLAUDE.md-style conventions) and contribute them at COMPOSE_REQUEST.
A Session Store that persists a "memory" slot can be composed with a Context Provider that reads from it.
A third-party extension can ship a memory category — but it is a Context Provider in the v1 taxonomy, not a separate kind.

The wiki does not promise a bundled memory system. See Extensibility Boundary.

Compaction and Context Providers

Context Provider contributions are re-assembled per turn and are not part of the history that gets compacted. A provider returning the same data each turn pays that token cost each turn. If a provider's content should be part of long-term session memory, the provider itself must handle persistence (via a state slot — see Extension State).

Compaction and SM

The SM does not force compaction directly. Each stage execution runs a fresh LLM transcript with its own turnCap and allowedTools, so compaction pressure is bounded per stage. A stage authored to keep transcripts short avoids compaction entirely; a long-running stage with a high turnCap will see compaction fire on threshold just like any other composition.

Orchestrator-level compaction (outside stage executions) follows the same thresholds and can be invoked by /compact per Commands.

See State Machines and Stage Definitions.

Compaction and audit

Compaction events are audited:

CompactionThresholdHit — threshold crossed.
CompactionStarted — compaction provider invoked.
CompactionCompleted — summary integrated. Audit payload includes a per-turn breakdown of which retention rule applied (verbatim / summarized / dropped / cleared by provider-native) when reasoning blocks were present.
CompactionFailed — compaction errored (e.g., provider unavailable).
DoubleCompactionConfigured — a session start where both core compaction and Anthropic provider-native contextManagement are configured on the same provider entry. Surfaces the precedence rule (core wins) without changing behavior.

The audit records reference the compaction provider used, the token counts before/after, and the correlation ID to the turn that triggered compaction. See Audit Trail.

Failure modes

Failure	Handling
Compaction provider errors	Retry per the provider's retry policy; if still failing, the turn errors with `ContextOverflow`.
Summary exceeds reserved budget	Emit `CompactionSummaryOversize` diagnostic; re-attempt with a tighter prompt; eventually truncate.
Compaction loop (summary grows each turn)	Emit `CompactionNoProgress` diagnostic; user / SM intervenes.
Reasoning content compaction edge cases	Summary that mentions reasoning content but lands as ordinary text — emit `CompactionReasoningDowngraded` diagnostic to surface the fidelity loss to the user. Compaction is not allowed to repackage the summary as a provider-native thinking block (the invariant from Reasoning blocks as a compactable content type).
Provider-native context management ran twice	If Anthropic `contextManagement` ran at the wire AND core compaction ran on the same turn boundary, the manifest reflects core compaction's view. Emit `CompactionDoubleRan` observability event so users can investigate cost spikes.

Security

Compaction summaries are derived from conversation history, which may contain sensitive user input. The compaction provider inherits the same constraints as any LLM request:

Env values are not inlined.
Tool outputs have already been redacted per the tool's contract and any post-tool hooks.
The summary text, once produced, is treated like any other LLM-generated content (untrusted, possibly prompt-injection-bearing).

See LLM Context Isolation, Secrets Hygiene.

Related pages

Context Assembly
Context Providers (concept)
Context Providers (contract)
State Machines
Prompt Registry
Determinism and Ordering
Audit Trail
Provider Params — sendReasoning policy that produces durable reasoning content for compaction.

Changelog

1.0.0 — initial

Compaction is a core subsystem with three triggers: threshold (default 80% of model window), explicit /compact, post-failure recovery from ContextOverflow.
"What gets compacted" table: system prompt and recent N turns preserved; older turns and tool outputs summarized; Context Provider contributions re-computed each turn (not compacted).
Non-determinism is loud: LLM-based summarization makes downstream replay non-deterministic; deterministic workflows design around this either by sizing context to avoid compaction or by configuring a deterministic compaction mode.
Audit kinds: CompactionThresholdHit, CompactionStarted, CompactionCompleted, CompactionFailed.

1.1.0 — reasoning blocks as a compactable content type; `contextManagement` precedence

"What gets compacted" table gains a row for reasoning blocks (Anthropic thinking blocks present when Provider Params' sendReasoning: true is in effect).
New "Reasoning blocks as a compactable content type" section documents four retention rules: verbatim within the protected window, summarized as ordinary text for older turns, dropped (per SM declaration or user policy), and wire-boundary clear (Anthropic contextManagement.clear_thinking). Normative invariant: compaction never fabricates provider-native thinking blocks from summaries.
New "Core compaction vs Anthropic contextManagement" section pins precedence: core compaction is the canonical persistence-shaping mechanism (operates on the manifest); Anthropic contextManagement is a wire-boundary edit (operates on the request body). Both surfaces may run side by side; manifest reflects core compaction's view. DoubleCompactionConfigured validation warning when both are configured.
Audit kinds extended with CompactionReasoningDowngraded, CompactionDoubleRan, DoubleCompactionConfigured.
Failure-modes table extended with reasoning-content compaction edge cases and the double-run observability event.
No removal of pre-existing prose; all changes are additive on top of 1.0.0.

Introduction

Reading

Core runtime

Contracts

Category contracts

Context

Security

Runtime behavior

Operations

Providers (bundled)

Integrations

MCP

Reference extensions

Tools

UI

Session Stores

Filesystem

Loggers

File

Providers

CLI-Wrapper

Hooks

Context Providers

System Prompt File

Commands

Bundled

Case studies

Flows

Maintainers

CLAUDE.md

Compaction and Memory

Compaction and Memory

The problem

When compaction fires

What gets compacted

Reasoning blocks as a compactable content type {#reasoning-blocks-as-a-compactable-content-type}

Core compaction vs Anthropic contextManagement

How compaction runs

Non-determinism, loudly

Designing around compaction (for deterministic workflows)

Memory vs compaction

Compaction and Context Providers

Compaction and SM

Compaction and audit

Failure modes

Security

Related pages

Changelog

1.0.0 — initial

1.1.0 — reasoning blocks as a compactable content type; contextManagement precedence

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Core compaction vs Anthropic `contextManagement`

1.1.0 — reasoning blocks as a compactable content type; `contextManagement` precedence