Skip to content

Compaction and Memory

Z-M-Huang edited this page May 1, 2026 · 4 revisions

Compaction and Memory

contractVersion: 1.1.0

How a session handles conversation history that exceeds the model's context window: summarization, token budgeting, the non-determinism implication, and why deterministic workflows need to design around it.


The problem

Every LLM has a finite context window. A long session — many turns, many tool calls — accumulates history that eventually exceeds it. Compaction is the process of summarizing old history into a smaller representation so newer content fits.

Compaction is a core subsystem. Extensions cannot replace it, but can influence it (SM may force compaction earlier; Context Providers may contribute to the compaction summary).


When compaction fires

flowchart TD
    Assembly[context assembly] --> Check{fits in<br/>model window?}
    Check -->|yes| Send[send request]
    Check -->|no, near limit| Warn[emit threshold event]
    Warn --> Compact[compact history]
    Compact --> Retry[rebuild request]
    Retry --> Send
Loading

Triggers:

  • Threshold: the request token count crosses a configured fraction of the model window (default 80%).
  • Explicit: the user runs /compact.
  • Post-failure: a ContextOverflow from the provider forces compaction as a recovery path.

What gets compacted

Section Compactable?
System prompt No. It is considered load-bearing.
Recent N turns No. The window of "fresh" history is preserved.
Older turns Yes. Summarized into a "compacted history" block.
Tool outputs Yes — often more compactable than messages.
Reasoning blocks (Anthropic thinking blocks present when sendReasoning: true; see Provider Params § Reasoning persistence) Yes, with explicit retention rules — see Reasoning blocks as a compactable content type below.
Compacted history (prior) Yes — further compacted as needed.
Context Provider contributions No. Re-computed each turn.

The boundary between "recent" and "older" is the session's compaction threshold — configurable, default is something like "last 4 turns."


Reasoning blocks as a compactable content type {#reasoning-blocks-as-a-compactable-content-type}

When Anthropic's sendReasoning: true is in effect (the v1 default), assistant messages carry a content array that includes thinking blocks alongside text and tool-call structure. See Provider Params for the canonical policy. Reasoning blocks are durable conversation content, but they have different fidelity, safety, and provider-portability properties than assistant text. Compaction handles them with explicit retention rules:

Retention rule Default applicability Behavior
Verbatim Most-recent compaction-pass turn(s) within the protected window Reasoning blocks remain unchanged in the persisted message.
Summarized as ordinary text Older turns, when the compaction provider produces a single summary block per turn Reasoning text contributes to the summary's input prompt; the output replaces the original message as ordinary text — the summary is not repackaged as a provider-native thinking block.
Dropped Tool-call reasoning context that the SM declares disposable; user-configured drop policy on long-running sessions Reasoning is removed from the persisted message; assistant text and tool calls remain.
Wire-boundary clear (Anthropic contextManagement.clear_thinking) When the user configures Anthropic's provider-native context management Provider-native thinking-clear edits the request at the wire boundary before it leaves the adapter; the manifest is unaffected. Core compaction operates on the manifest and is the canonical persistence-shaping mechanism — see Core compaction vs Anthropic contextManagement below.

Compaction never fabricates provider-native thinking blocks from summaries. A summarized older turn becomes ordinary text. This is a normative invariant — emitting a synthesized thinking block downstream would mislead the model about what it actually thought and could break re-roll determinism on Anthropic.

Default when sendReasoning: true: reasoning blocks within the recent-turn protected window are kept verbatim; older reasoning is summarized into ordinary text. Users may opt into a stricter "drop reasoning beyond the protected window" policy per the configurable compaction mode (see Designing around compaction (for deterministic workflows)).

When sendReasoning: false, assistant messages carry no thinking blocks and these rules do not apply.


Core compaction vs Anthropic contextManagement

Anthropic's adapter exposes a contextManagement field on defaultParams for provider-native context editing — automatic compaction at the wire boundary, thinking clearing, and similar surfaces. The two surfaces operate on different artifacts and at different points in the pipeline:

  • Core compaction (this page) operates on the manifest. It rewrites persisted history; the rewrite is durable and feeds every subsequent turn's request. It is the canonical session-state-shaping mechanism in v1.
  • Anthropic contextManagement operates at the wire boundary. It edits the request body the adapter is about to send to Anthropic, on every turn. Its effects are visible only to that wire request; the manifest is unchanged.

Precedence: core compaction is canonical. When both surfaces are configured on the same provider entry, the manifest reflects core compaction's view. Anthropic contextManagement still runs at the wire boundary on every turn (the adapter does not suppress it), but it never substitutes for core compaction on the persistence side. Users should think of contextManagement as an in-flight request edit, not a manifest mutation.

DoubleCompactionConfigured validation warning. Validation Pipeline emits this warning at session start when core compaction (enabled by default) is paired with Anthropic contextManagement on the same provider entry. The session continues; the warning surfaces the duplication so users can decide whether to disable one of the surfaces or accept both running side by side.


How compaction runs

sequenceDiagram
    autonumber
    participant Core
    participant Provider as "compaction provider<br/>(bundled: LLM-based)"
    participant LLM
    Core->>Provider: request summary<br/>(older history + prior compaction)
    Provider->>LLM: summarization call
    LLM-->>Provider: summary text
    Provider-->>Core: compacted block
    Core->>Core: replace older history<br/>with compacted block
    Core->>Core: re-assemble next request
Loading

The bundled compaction provider uses the session's active LLM provider with a compaction-specific prompt (from the Prompt Registry). Compaction itself is core — see Extensibility Boundary. Extensions influence it (SM may force or defer; Context Providers may contribute the compaction prompt or ambient context) but cannot replace the subsystem.


Non-determinism, loudly

Compaction is non-deterministic when it uses an LLM to produce summaries:

  • The same history compacted twice may yield different summaries.
  • Downstream decisions (SM transitions, hook guards, tool choices) reading compacted content inherit that non-determinism.
  • Replaying a session that crossed a compaction boundary is not guaranteed to produce the same future turns.

This is called out loudly because:

  • Deterministic workflows are a primary use case for stud-cli (see the user's state-machine motivation).
  • A silent LLM-summarization step that flips determinism on its head would be a rug-pull.

Designing around compaction (for deterministic workflows)

Two strategies:

  1. Size the context to avoid compaction. Keep turns small; use Context Providers that load pertinent info per turn rather than accumulating history.
  2. Configure a deterministic compaction mode on the core compaction subsystem (e.g., "drop tool outputs older than N turns; keep everything else verbatim until the window"). Core may ship alternative compaction modes under a configuration key; the subsystem remains core. This trades fidelity for predictability.

Either strategy is compatible with v1. The wiki does not pick one; workflows pick.


Memory vs compaction

Memory in v1 is conversation history. stud-cli does not ship a long-term, cross-session "memory" store. Long-term memory (facts the assistant should remember across sessions) is a workflow choice:

  • A Context Provider can read project files (e.g., CLAUDE.md-style conventions) and contribute them at COMPOSE_REQUEST.
  • A Session Store that persists a "memory" slot can be composed with a Context Provider that reads from it.
  • A third-party extension can ship a memory category — but it is a Context Provider in the v1 taxonomy, not a separate kind.

The wiki does not promise a bundled memory system. See Extensibility Boundary.


Compaction and Context Providers

Context Provider contributions are re-assembled per turn and are not part of the history that gets compacted. A provider returning the same data each turn pays that token cost each turn. If a provider's content should be part of long-term session memory, the provider itself must handle persistence (via a state slot — see Extension State).


Compaction and SM

The SM does not force compaction directly. Each stage execution runs a fresh LLM transcript with its own turnCap and allowedTools, so compaction pressure is bounded per stage. A stage authored to keep transcripts short avoids compaction entirely; a long-running stage with a high turnCap will see compaction fire on threshold just like any other composition.

Orchestrator-level compaction (outside stage executions) follows the same thresholds and can be invoked by /compact per Commands.

See State Machines and Stage Definitions.


Compaction and audit

Compaction events are audited:

  • CompactionThresholdHit — threshold crossed.
  • CompactionStarted — compaction provider invoked.
  • CompactionCompleted — summary integrated. Audit payload includes a per-turn breakdown of which retention rule applied (verbatim / summarized / dropped / cleared by provider-native) when reasoning blocks were present.
  • CompactionFailed — compaction errored (e.g., provider unavailable).
  • DoubleCompactionConfigured — a session start where both core compaction and Anthropic provider-native contextManagement are configured on the same provider entry. Surfaces the precedence rule (core wins) without changing behavior.

The audit records reference the compaction provider used, the token counts before/after, and the correlation ID to the turn that triggered compaction. See Audit Trail.


Failure modes

Failure Handling
Compaction provider errors Retry per the provider's retry policy; if still failing, the turn errors with ContextOverflow.
Summary exceeds reserved budget Emit CompactionSummaryOversize diagnostic; re-attempt with a tighter prompt; eventually truncate.
Compaction loop (summary grows each turn) Emit CompactionNoProgress diagnostic; user / SM intervenes.
Reasoning content compaction edge cases Summary that mentions reasoning content but lands as ordinary text — emit CompactionReasoningDowngraded diagnostic to surface the fidelity loss to the user. Compaction is not allowed to repackage the summary as a provider-native thinking block (the invariant from Reasoning blocks as a compactable content type).
Provider-native context management ran twice If Anthropic contextManagement ran at the wire AND core compaction ran on the same turn boundary, the manifest reflects core compaction's view. Emit CompactionDoubleRan observability event so users can investigate cost spikes.

Security

Compaction summaries are derived from conversation history, which may contain sensitive user input. The compaction provider inherits the same constraints as any LLM request:

  • Env values are not inlined.
  • Tool outputs have already been redacted per the tool's contract and any post-tool hooks.
  • The summary text, once produced, is treated like any other LLM-generated content (untrusted, possibly prompt-injection-bearing).

See LLM Context Isolation, Secrets Hygiene.


Related pages


Changelog

1.0.0 — initial

  • Compaction is a core subsystem with three triggers: threshold (default 80% of model window), explicit /compact, post-failure recovery from ContextOverflow.
  • "What gets compacted" table: system prompt and recent N turns preserved; older turns and tool outputs summarized; Context Provider contributions re-computed each turn (not compacted).
  • Non-determinism is loud: LLM-based summarization makes downstream replay non-deterministic; deterministic workflows design around this either by sizing context to avoid compaction or by configuring a deterministic compaction mode.
  • Audit kinds: CompactionThresholdHit, CompactionStarted, CompactionCompleted, CompactionFailed.

1.1.0 — reasoning blocks as a compactable content type; contextManagement precedence

  • "What gets compacted" table gains a row for reasoning blocks (Anthropic thinking blocks present when Provider Params' sendReasoning: true is in effect).
  • New "Reasoning blocks as a compactable content type" section documents four retention rules: verbatim within the protected window, summarized as ordinary text for older turns, dropped (per SM declaration or user policy), and wire-boundary clear (Anthropic contextManagement.clear_thinking). Normative invariant: compaction never fabricates provider-native thinking blocks from summaries.
  • New "Core compaction vs Anthropic contextManagement" section pins precedence: core compaction is the canonical persistence-shaping mechanism (operates on the manifest); Anthropic contextManagement is a wire-boundary edit (operates on the request body). Both surfaces may run side by side; manifest reflects core compaction's view. DoubleCompactionConfigured validation warning when both are configured.
  • Audit kinds extended with CompactionReasoningDowngraded, CompactionDoubleRan, DoubleCompactionConfigured.
  • Failure-modes table extended with reasoning-content compaction edge cases and the double-run observability event.
  • No removal of pre-existing prose; all changes are additive on top of 1.0.0.

Introduction

Reading

Core runtime

Contracts

Category contracts

Context

Security

Runtime behavior

Operations

Providers (bundled)

Integrations

Reference extensions

Tools

UI

Session Stores

Loggers

Providers

Hooks

Context Providers

Commands

Case studies

Flows

Maintainers

Clone this wiki locally