Skip to content

Reduce duplicate transcript memory in heavy active local sessions #49

@1jehuang

Description

@1jehuang

Summary

Heavy active local client sessions still retain significantly more memory than expected because the same conversation can remain resident in multiple full in-memory representations at once:

  • Session.messages as persisted StoredMessage history
  • App.messages as provider-ready Message history
  • display_messages as UI-rendered history

We already landed meaningful memory wins for:

  • remote/startup transcript retention
  • idle restored local sessions via lazy provider hydration
  • reload propagation/reexec behavior

But heavy active local sessions remain expensive.

Evidence

Live profiling on heavy sessions like cat showed roughly:

  • PSS: ~126 to 131 MB
  • private heap: ~112 to 130 MB
  • persisted session file: only ~2.3 MB
  • tool result text: only ~1.7 MB

So the remaining footprint is much larger than raw transcript payload size and is likely dominated by duplicated transcript structures, copied content blocks/strings, and rendered display copies.

Current architecture problem

Today we intentionally keep different transcript forms for different jobs, but for large active sessions the overlap is too expensive:

  • Session.messages is the canonical persisted history and contains session-only metadata such as id, display_role, and token_usage
  • App.messages is the live provider/runtime transcript used for turn execution, compaction, and injected context
  • display_messages is the UI-facing rendered transcript

This is useful, but for large tool-heavy sessions it means old content can effectively be retained multiple times.

Goal

Keep all three product requirements:

  • stable persisted session logs
  • low-latency provider/model interaction
  • low-latency UI rendering

while reducing duplicate in-memory transcript state for heavy active local sessions.

Proposed direction

  1. Keep Session.messages as the canonical transcript source of truth.
  2. Make App.messages more clearly a derived/cache view rather than a permanent second full transcript.
  3. Make display_messages lighter for large historical tool outputs:
    • prefer previews/truncation/lazy expansion for large old tool results
    • avoid eagerly copying full large payloads when not needed
  4. Add memory attribution for:
    • canonical transcript bytes
    • provider cache bytes
    • display cache bytes
    • retained large tool output bytes

Likely implementation stages

Stage 1

Target large historical tool outputs first.

  • Canonical full data stays in Session.messages
  • Provider/display layers avoid eagerly retaining another full owned copy unless needed

Stage 2

Reduce active local duplication between Session.messages and App.messages.

  • Prefer incremental hydration / cache invalidation over maintaining two full authoritative copies

Stage 3

Add diagnostics so we can measure wins and catch regressions.

Expected win

Best estimate from current profiling:

  • heavy local sessions: ~40 to 70 MB reduction each
  • medium-heavy sessions: ~20 to 40 MB reduction each

Not all of the current heavy heap is reclaimable because some hot provider/UI state is still necessary, but the remaining duplication looks large enough to be worth the refactor.

Notes

This issue is specifically about active local sessions. The remote/startup and idle restore cases have already improved and should not be conflated with the remaining heavy-session problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions