Skip to content

Feature request: Persistent agent state and signed compressed KV cache capsules for long-running coding tasks #2904

@skiyo

Description

@skiyo

Hi CodeWhale team,

I would like to propose a feature for improving the cost, latency, and continuity of long-running agentic coding sessions: persistent agent state, with a possible future extension to server-signed compressed KV cache capsules.

Problem

Many coding-agent workflows are not short interactive chats. They are long-running engineering tasks.

A common example from machine learning / algorithm engineering:

  1. I ask the coding agent to inspect code, design experiments, and submit multiple training jobs to a GPU cluster.
  2. The jobs run for several hours.
  3. I come back with logs, metrics, failed runs, TensorBoard summaries, or model comparison results.
  4. The agent needs to continue from the previous reasoning context and propose the next experimental plan.

The problem is that after a few hours, the previous prompt/cache state may no longer be reusable. The agent or the underlying LLM often needs to reprocess a very long context again: repo context, task history, tool outputs, previous plans, experiment configs, and user decisions.

This creates two issues:

  • High cost from repeatedly recreating long-context prompt caches.
  • High latency from repeatedly pre-filling the same large context.

This is especially painful for agentic workflows where users naturally pause for hours or days while external jobs, CI pipelines, training tasks, or data processing tasks complete.

Proposed feature

CodeWhale could support a persistent task/session state mechanism with two complementary layers:

1. Local structured agent state

CodeWhale writes a durable local checkpoint such as:

  • task objective
  • repo / branch / commit snapshot
  • experiment plan
  • submitted jobs
  • artifact paths
  • previous decisions
  • current known issues
  • next actions

This makes the task recoverable and auditable even if no model-side cache is available.

2. Optional server-signed compressed KV cache capsule

For deployments where CodeWhale controls or integrates with the model-serving layer, a future extension could be:

  • Before a session becomes inactive or before a server-side cache expires, CodeWhale requests the model server to export a compressed inference-state capsule.
  • The server compresses the reusable prefix KV cache.
  • The server signs the capsule and binds it to metadata such as:
    • user / org scope
    • model ID and model/runtime version
    • tokenizer version
    • compression algorithm
    • prefix hash
    • position range
    • creation time and expiration time
    • permission scope
  • CodeWhale stores the opaque capsule locally.
  • When the user resumes the task hours or days later, CodeWhale uploads the capsule plus the new user input.
  • The server verifies the signature and metadata, restores the KV cache if compatible, and only pre-fills the incremental new context.

In other words, CodeWhale would not need to replay the entire historical conversation or rebuild the same cache from scratch every time a long-running task resumes.

Why this may be feasible

Recent KV cache compression research suggests that compressed KV state can be made much smaller while preserving quality reasonably well, especially with quantization and low-rank methods. Relevant directions include:

  • KV cache quantization, for example KIVI and KVQuant.
  • Low-rank / SVD-style KV compression, for example Palu, SVDq, ReCalKV.
  • Cache streaming / transport formats, for example CacheGen.
  • Prefix and non-prefix cache reuse, for example CacheBlend.
  • Serving-system-level prefix caching and cache offloading.

The key point is that this does not have to expose raw model internals to the client. The exported cache could be an opaque, encrypted, server-signed blob. The client stores it but cannot inspect or modify it.

Security / integrity model

A possible design:

server creates compressed cache capsule
server signs capsule + metadata with private key
client stores capsule locally
client later uploads capsule
server verifies signature with public key / key registry
server checks metadata, permissions, expiry, model/runtime compatibility
server restores cache only if all checks pass

This does not solve every compatibility or quality issue, but it should address tampering and integrity concerns. The server remains the only party that creates, validates, and interprets the cache capsule.

Suggested UX shape

Possible user-facing commands:

codewhale checkpoint
codewhale resume
codewhale status

Possible internal behavior:

on session pause:
  write local structured state
  optionally request server-side cache capsule export

on session resume:
  load local structured state
  optionally upload compatible cache capsule
  attach new user message / new artifacts
  continue from restored task state

For users, this could simply feel like:

CodeWhale can resume long-running coding tasks cheaply and quickly, even after hours or days.

Why this matters

This would be especially useful for:

  • ML / algorithm engineering workflows
  • long-running GPU training jobs
  • CI / build / test pipelines
  • large repo refactoring
  • data processing tasks
  • research agents
  • multi-step debugging sessions
  • tasks where the user returns hours or days later

In these workflows, the cost is not just generation tokens. A large part of the cost comes from repeatedly reconstructing the same long context.

Request

Would the CodeWhale team consider adding a persistent agent state / cache-resume mechanism?

Even a first version with only local structured checkpoints would be valuable. A later version with server-signed compressed KV cache capsules could significantly reduce cost and latency for long-running agent tasks.

Thanks for considering this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or request

    Projects

    Status
    Backlog

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions