Skip to content

v0.9.0 HarnessPosture: model-specific context and subagent policy #2693

@Hmbown

Description

@Hmbown

Goal

Make CodeWhale's harness strategy explicit per provider/model route instead of assuming every model wants the same amount of up-front system context.

This came out of v0.8.53 testing: DeepSeek V4 and Xiaomi MiMo v2.5 appear likely to benefit from a cache-heavy/prefix-stable starting prompt, while many other routes may do better with a lean root context, immediate exploration, subagents, and compact handoff packets.

This is related to #2672, but this issue is specifically about runtime harness posture: how much context to load up front, when to encourage subagents, when to compact, and when to start fresh from a handoff.

Current Shape

  • Plan mode already has a read-only design-first prompt in crates/tui/src/prompts/modes/plan.md.
  • Plan mode currently resolves via update_plan and opens an approval prompt from crates/tui/src/tui/ui.rs / crates/tui/src/tui/plan_prompt.rs.
  • Tool exposure is already mode-sensitive in crates/tui/src/core/engine/turn_loop.rs.
  • Provider/model routing work is happening around the config/provider registry, but there is not yet a first-class policy object that says how a given route should use context, subagents, and handoffs.

Proposal

Add a central HarnessPosture policy chosen from provider + model + user override.

Possible initial values:

  • PrefixCached: stable, rich system/context prefix; prefer cache byte stability; minimal churn. Good default candidates: DeepSeek V4, Xiaomi MiMo v2.5.
  • LeanRootExplore: minimal starting prompt; strongly prefer quick repo orientation, subagent exploration, and on-demand docs/skills.
  • PlanHandoffReset: plan in one context, then launch implementation from a compact approved handoff packet in a fresh context.
  • SmallContextLocal: aggressive summarization, narrow tool surface, small prompts, cheap parallel probes.

The posture should affect:

  • System prompt assembly and context injection volume.
  • Whether Plan mode recommends handoff reset by default.
  • Subagent encouragement and default delegation copy.
  • Context compaction thresholds.
  • Which docs/skills are eagerly injected vs available on demand.
  • Telemetry labels so we can evaluate posture choice rather than argue from vibes.

Acceptance Criteria

  • A single registry/policy layer maps provider+model route to default HarnessPosture; no scattered provider-name conditionals.
  • User config can override posture without changing provider/model identity.
  • Tests assert at least:
    • DeepSeek V4 and MiMo v2.5 select a cache-heavy posture by default.
    • Generic OpenAI-compatible/OpenRouter/local routes can select a lean or handoff posture.
    • Prompt/tool catalog byte stability remains protected for prefix-cached routes.
  • Docs explain provider vs model vs harness posture as separate concepts.
  • Telemetry/logging records posture for later outcome analysis.

Non-goals

  • Do not hardcode benchmark-specific behavior.
  • Do not claim one posture is globally better until we have eval data.
  • Do not remove the existing global identity/constitution preamble as part of this work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    cache-maximalismDeepSeek V4 cache-maximal context and agent architecturecompactionContext management / compactioncontextContext management / contextdocumentationImprovements or additions to documentationenhancementNew feature or requestv0.9.0Targeting v0.9.0whaleflowWhaleFlow branch/leaf workflow runtime and workflow mode

    Projects

    Status
    In progress

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions