-
Notifications
You must be signed in to change notification settings - Fork 0
Compaction and Memory
contractVersion: 1.1.0
How a session handles conversation history that exceeds the model's context window: summarization, token budgeting, the non-determinism implication, and why deterministic workflows need to design around it.
Every LLM has a finite context window. A long session — many turns, many tool calls — accumulates history that eventually exceeds it. Compaction is the process of summarizing old history into a smaller representation so newer content fits.
Compaction is a core subsystem. Extensions cannot replace it, but can influence it (SM may force compaction earlier; Context Providers may contribute to the compaction summary).
flowchart TD
Assembly[context assembly] --> Check{fits in<br/>model window?}
Check -->|yes| Send[send request]
Check -->|no, near limit| Warn[emit threshold event]
Warn --> Compact[compact history]
Compact --> Retry[rebuild request]
Retry --> Send
Triggers:
- Threshold: the request token count crosses a configured fraction of the model window (default 80%).
-
Explicit: the user runs
/compact. -
Post-failure: a
ContextOverflowfrom the provider forces compaction as a recovery path.
| Section | Compactable? |
|---|---|
| System prompt | No. It is considered load-bearing. |
| Recent N turns | No. The window of "fresh" history is preserved. |
| Older turns | Yes. Summarized into a "compacted history" block. |
| Tool outputs | Yes — often more compactable than messages. |
Reasoning blocks (Anthropic thinking blocks present when sendReasoning: true; see Provider Params § Reasoning persistence) |
Yes, with explicit retention rules — see Reasoning blocks as a compactable content type below. |
| Compacted history (prior) | Yes — further compacted as needed. |
| Context Provider contributions | No. Re-computed each turn. |
The boundary between "recent" and "older" is the session's compaction threshold — configurable, default is something like "last 4 turns."
When Anthropic's sendReasoning: true is in effect (the v1 default), assistant messages carry a content array that includes thinking blocks alongside text and tool-call structure. See Provider Params for the canonical policy. Reasoning blocks are durable conversation content, but they have different fidelity, safety, and provider-portability properties than assistant text. Compaction handles them with explicit retention rules:
| Retention rule | Default applicability | Behavior |
|---|---|---|
| Verbatim | Most-recent compaction-pass turn(s) within the protected window | Reasoning blocks remain unchanged in the persisted message. |
| Summarized as ordinary text | Older turns, when the compaction provider produces a single summary block per turn | Reasoning text contributes to the summary's input prompt; the output replaces the original message as ordinary text — the summary is not repackaged as a provider-native thinking block. |
| Dropped | Tool-call reasoning context that the SM declares disposable; user-configured drop policy on long-running sessions | Reasoning is removed from the persisted message; assistant text and tool calls remain. |
Wire-boundary clear (Anthropic contextManagement.clear_thinking) |
When the user configures Anthropic's provider-native context management | Provider-native thinking-clear edits the request at the wire boundary before it leaves the adapter; the manifest is unaffected. Core compaction operates on the manifest and is the canonical persistence-shaping mechanism — see Core compaction vs Anthropic contextManagement below. |
Compaction never fabricates provider-native thinking blocks from summaries. A summarized older turn becomes ordinary text. This is a normative invariant — emitting a synthesized thinking block downstream would mislead the model about what it actually thought and could break re-roll determinism on Anthropic.
Default when sendReasoning: true: reasoning blocks within the recent-turn protected window are kept verbatim; older reasoning is summarized into ordinary text. Users may opt into a stricter "drop reasoning beyond the protected window" policy per the configurable compaction mode (see Designing around compaction (for deterministic workflows)).
When sendReasoning: false, assistant messages carry no thinking blocks and these rules do not apply.
Anthropic's adapter exposes a contextManagement field on defaultParams for provider-native context editing — automatic compaction at the wire boundary, thinking clearing, and similar surfaces. The two surfaces operate on different artifacts and at different points in the pipeline:
- Core compaction (this page) operates on the manifest. It rewrites persisted history; the rewrite is durable and feeds every subsequent turn's request. It is the canonical session-state-shaping mechanism in v1.
-
Anthropic
contextManagementoperates at the wire boundary. It edits the request body the adapter is about to send to Anthropic, on every turn. Its effects are visible only to that wire request; the manifest is unchanged.
Precedence: core compaction is canonical. When both surfaces are configured on the same provider entry, the manifest reflects core compaction's view. Anthropic contextManagement still runs at the wire boundary on every turn (the adapter does not suppress it), but it never substitutes for core compaction on the persistence side. Users should think of contextManagement as an in-flight request edit, not a manifest mutation.
DoubleCompactionConfigured validation warning. Validation Pipeline emits this warning at session start when core compaction (enabled by default) is paired with Anthropic contextManagement on the same provider entry. The session continues; the warning surfaces the duplication so users can decide whether to disable one of the surfaces or accept both running side by side.
sequenceDiagram
autonumber
participant Core
participant Provider as "compaction provider<br/>(bundled: LLM-based)"
participant LLM
Core->>Provider: request summary<br/>(older history + prior compaction)
Provider->>LLM: summarization call
LLM-->>Provider: summary text
Provider-->>Core: compacted block
Core->>Core: replace older history<br/>with compacted block
Core->>Core: re-assemble next request
The bundled compaction provider uses the session's active LLM provider with a compaction-specific prompt (from the Prompt Registry). Compaction itself is core — see Extensibility Boundary. Extensions influence it (SM may force or defer; Context Providers may contribute the compaction prompt or ambient context) but cannot replace the subsystem.
Compaction is non-deterministic when it uses an LLM to produce summaries:
- The same history compacted twice may yield different summaries.
- Downstream decisions (SM transitions, hook guards, tool choices) reading compacted content inherit that non-determinism.
- Replaying a session that crossed a compaction boundary is not guaranteed to produce the same future turns.
This is called out loudly because:
- Deterministic workflows are a primary use case for stud-cli (see the user's state-machine motivation).
- A silent LLM-summarization step that flips determinism on its head would be a rug-pull.
Two strategies:
- Size the context to avoid compaction. Keep turns small; use Context Providers that load pertinent info per turn rather than accumulating history.
- Configure a deterministic compaction mode on the core compaction subsystem (e.g., "drop tool outputs older than N turns; keep everything else verbatim until the window"). Core may ship alternative compaction modes under a configuration key; the subsystem remains core. This trades fidelity for predictability.
Either strategy is compatible with v1. The wiki does not pick one; workflows pick.
Memory in v1 is conversation history. stud-cli does not ship a long-term, cross-session "memory" store. Long-term memory (facts the assistant should remember across sessions) is a workflow choice:
- A Context Provider can read project files (e.g., CLAUDE.md-style conventions) and contribute them at COMPOSE_REQUEST.
- A Session Store that persists a "memory" slot can be composed with a Context Provider that reads from it.
- A third-party extension can ship a memory category — but it is a Context Provider in the v1 taxonomy, not a separate kind.
The wiki does not promise a bundled memory system. See Extensibility Boundary.
Context Provider contributions are re-assembled per turn and are not part of the history that gets compacted. A provider returning the same data each turn pays that token cost each turn. If a provider's content should be part of long-term session memory, the provider itself must handle persistence (via a state slot — see Extension State).
The SM does not force compaction directly. Each stage execution runs a fresh LLM transcript with its own turnCap and allowedTools, so compaction pressure is bounded per stage. A stage authored to keep transcripts short avoids compaction entirely; a long-running stage with a high turnCap will see compaction fire on threshold just like any other composition.
Orchestrator-level compaction (outside stage executions) follows the same thresholds and can be invoked by /compact per Commands.
See State Machines and Stage Definitions.
Compaction events are audited:
-
CompactionThresholdHit— threshold crossed. -
CompactionStarted— compaction provider invoked. -
CompactionCompleted— summary integrated. Audit payload includes a per-turn breakdown of which retention rule applied (verbatim / summarized / dropped / cleared by provider-native) when reasoning blocks were present. -
CompactionFailed— compaction errored (e.g., provider unavailable). -
DoubleCompactionConfigured— a session start where both core compaction and Anthropic provider-nativecontextManagementare configured on the same provider entry. Surfaces the precedence rule (core wins) without changing behavior.
The audit records reference the compaction provider used, the token counts before/after, and the correlation ID to the turn that triggered compaction. See Audit Trail.
| Failure | Handling |
|---|---|
| Compaction provider errors | Retry per the provider's retry policy; if still failing, the turn errors with ContextOverflow. |
| Summary exceeds reserved budget | Emit CompactionSummaryOversize diagnostic; re-attempt with a tighter prompt; eventually truncate. |
| Compaction loop (summary grows each turn) | Emit CompactionNoProgress diagnostic; user / SM intervenes. |
| Reasoning content compaction edge cases | Summary that mentions reasoning content but lands as ordinary text — emit CompactionReasoningDowngraded diagnostic to surface the fidelity loss to the user. Compaction is not allowed to repackage the summary as a provider-native thinking block (the invariant from Reasoning blocks as a compactable content type). |
| Provider-native context management ran twice | If Anthropic contextManagement ran at the wire AND core compaction ran on the same turn boundary, the manifest reflects core compaction's view. Emit CompactionDoubleRan observability event so users can investigate cost spikes. |
Compaction summaries are derived from conversation history, which may contain sensitive user input. The compaction provider inherits the same constraints as any LLM request:
- Env values are not inlined.
- Tool outputs have already been redacted per the tool's contract and any post-tool hooks.
- The summary text, once produced, is treated like any other LLM-generated content (untrusted, possibly prompt-injection-bearing).
See LLM Context Isolation, Secrets Hygiene.
- Context Assembly
- Context Providers (concept)
- Context Providers (contract)
- State Machines
- Prompt Registry
- Determinism and Ordering
- Audit Trail
-
Provider Params —
sendReasoningpolicy that produces durable reasoning content for compaction.
- Compaction is a core subsystem with three triggers: threshold (default 80% of model window), explicit
/compact, post-failure recovery fromContextOverflow. - "What gets compacted" table: system prompt and recent N turns preserved; older turns and tool outputs summarized; Context Provider contributions re-computed each turn (not compacted).
- Non-determinism is loud: LLM-based summarization makes downstream replay non-deterministic; deterministic workflows design around this either by sizing context to avoid compaction or by configuring a deterministic compaction mode.
- Audit kinds:
CompactionThresholdHit,CompactionStarted,CompactionCompleted,CompactionFailed.
- "What gets compacted" table gains a row for reasoning blocks (Anthropic thinking blocks present when Provider Params'
sendReasoning: trueis in effect). - New "Reasoning blocks as a compactable content type" section documents four retention rules: verbatim within the protected window, summarized as ordinary text for older turns, dropped (per SM declaration or user policy), and wire-boundary clear (Anthropic
contextManagement.clear_thinking). Normative invariant: compaction never fabricates provider-native thinking blocks from summaries. - New "Core compaction vs Anthropic
contextManagement" section pins precedence: core compaction is the canonical persistence-shaping mechanism (operates on the manifest); AnthropiccontextManagementis a wire-boundary edit (operates on the request body). Both surfaces may run side by side; manifest reflects core compaction's view.DoubleCompactionConfiguredvalidation warning when both are configured. - Audit kinds extended with
CompactionReasoningDowngraded,CompactionDoubleRan,DoubleCompactionConfigured. - Failure-modes table extended with reasoning-content compaction edge cases and the double-run observability event.
- No removal of pre-existing prose; all changes are additive on top of 1.0.0.
- Execution Model
- Message Loop
- Concurrency and Cancellation
- Error Model
- Event and Command Ordering
- Event Bus
- Command Model
- Interaction Protocol
- Hook Taxonomy
- Host API
- Extension Lifecycle
- Env Provider
- Prompt Registry
- Resource Registry
- Session Lifecycle
- Session Manifest
- Persistence and Recovery
- Stage Executions
- Subagent Sessions
- Contract Pattern
- Versioning and Compatibility
- Deprecation Policy
- Capability Negotiation
- Dependency Resolution
- Validation Pipeline
- Cardinality and Activation
- Extension State
- Conformance and Testing
- Providers
- Provider Params
- Tools
- Hooks
- UI
- Loggers
- State Machines
- SM Stage Lifecycle
- Stage Definitions
- Commands
- Session Store
- Context Providers
- Settings Shape
- Trust Model
- Project Trust
- Extension Isolation
- Extension Integrity
- LLM Context Isolation
- Secrets Hygiene
- Security Modes
- Tool Approvals
- MCP Trust
- Sandboxing
- Configuration Scopes
- Project Root
- Extension Discovery
- Extension Installation
- Extension Reloading
- Headless and Interactor
- Determinism and Ordering
- Launch Arguments
- Network Policy
- Platform Integration
Tools
UI
Session Stores
Loggers
Providers
Hooks
Context Providers
Commands
- First Run
- Default Chat
- Tool Call Cycle
- Hook Interception
- Guard Deny Reproposal
- State Machine Workflow
- SM Stage Retry
- Hot Model Switch
- Capability Mismatch Switch
- Session Resume
- Session Resume Drift
- Approval and Auth
- Interaction Timeout
- Headless Run
- Parallel Tool Approvals
- Subagent Delegation
- Scope Layering
- Project First-Run Trust
- Reload Mid-Turn
- Compaction Warning
- MCP Remote Tool Call
- MCP Prompt Consume
- MCP Resource Bind
- MCP Reconnect