Skip to content

🤖 fix: preserve subagent usage after agent_report#2351

Merged
ethanndickson merged 15 commits intomainfrom
cost-tracking-44ax
Feb 12, 2026
Merged

🤖 fix: preserve subagent usage after agent_report#2351
ethanndickson merged 15 commits intomainfrom
cost-tracking-44ax

Conversation

@ethanndickson
Copy link
Member

@ethanndickson ethanndickson commented Feb 11, 2026

Summary

Subagent costs were disappearing from the Costs tab because agent_report handling aborted the child stream before AI SDK usage was fully recorded. This PR replaces the old tool-call-end abort-based lifecycle with a stopWhen-driven model where agent_report ends the stream naturally at the step boundary, and stream-end is the single source of truth for report finalization and cleanup.

Background

Three related bugs drove this work:

  1. Disappearing subagent costs. handleAgentReport (triggered on tool-call-end) called stopStream to abort the child, cutting off AI SDK usage accounting for the final step.
  2. Out-of-order cleanup in task trees. cleanupReportedLeafTask used hasActiveDescendantAgentTasks to gate parent deletion, but that predicate ignores reported children. A reported parent with two reported children could delete itself when the first child cleaned up, orphaning the second.
  3. Post-report tool execution. After agent_report, the stream could continue into another step and execute more tool calls even though the task was already reported and the parent resumed.

Implementation

agent_report tool

  • Returns { success: true, message: "Report submitted successfully." } — no side-effects, no stream abort.

Stream manager

  • Added hasToolCall("agent_report") to the autonomous stopWhen condition array. The stream ends at the first step boundary containing agent_report — natural completion preserves usage accounting while preventing post-report tool execution.
  • deletePartial / updateHistory return values at stream-end are now checked and logged on failure.

Task report finalization (taskService.ts)

  • Removed handleAgentReport (the tool-call-end listener), readLatestAgentReportArgs, findAgentReportArgsInMessage, and the stopStream/commitPartial block inside finalizeAgentTaskReport. Report args are now read directly from event.parts in handleStreamEnd via findAgentReportArgsInParts.
  • Removed stream-abort listener, handleStreamAbort, isStreamAbortEventstream-end is the only terminal event TaskService listens to.
  • Added two-phase completion: finalizeAgentTaskReport is purely logical (status + report delivery + waiter resolution). Cleanup progression is driven by finalizeTerminationPhaseForReportedTask, called from handleStreamEnd after finalization.
  • Added canCleanupReportedTask predicate: checks taskStatus === "reported", !isStreaming, structural-leaf topology, and no pending patch artifact.
  • Patch generation runs before waiter resolution (maybeStartPatchGenerationForReportedTask), so an immediate task_await result after agent_report can include a pending artifact record.

Structural-leaf cleanup ordering

  • Added hasChildAgentTasks(index, workspaceId) — a topology predicate that checks whether a workspace has any child agent-task nodes in config, regardless of their status.
  • Replaced the hasActiveDescendantAgentTasks call in cleanup with the structural-leaf check. A reported task can only be deleted when it has zero children in config.

Cleanup

  • Extracted isTypedWorkspaceEvent helper to deduplicate type-guard functions.
  • Removed ToolCallEndEvent import from taskService (no longer used).

Validation

  • make static-check — all checks pass
  • bun test src/node/services/taskService.test.ts — 36 pass
  • bun test src/node/services/tools/task_await.test.ts — 8 pass
  • bun test src/node/services/streamManager.test.ts — stopWhen tests pass

Risks

  • Cleanup timing: mitigated by per-workspace event locking, idempotent cleanupReportedLeafTask, and tests covering both agent_reporthandleStreamEnd flow and structural-leaf ordering.
  • Structural-leaf gate is conservative: a reported parent with a stuck reported child will never auto-delete. Intentional — manual cleanup is safer than orphaning workspaces or losing artifacts.
  • stopWhen scope: hasToolCall("agent_report") only applies to autonomous mode (not toolChoice mode which already uses stepCountIs(1)). The condition is additive and cannot affect streams that don't invoke agent_report.

Generated with mux • Model: anthropic:claude-opus-4-6 • Thinking: xhigh • Cost: $14.32

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c93a78f397

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ethanndickson
Copy link
Member Author

@codex review

Addressed the P1 race by moving exec patch generation + cleanup into stream termination handling with lock-serialized cleanup rechecks.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b3facf8a82

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ethanndickson
Copy link
Member Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7c6b7e2118

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ethanndickson
Copy link
Member Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 700ef25d25

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ethanndickson
Copy link
Member Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c536587a48

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ethanndickson
Copy link
Member Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: abd87128c3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ethanndickson
Copy link
Member Author

@codex review

Re: your previous finding about the timing window between agent_report executing and stream-end firing — we are intentionally not handling this case. Here is why:

stopWhen: hasToolCall("agent_report") is evaluated at step boundaries by the AI SDK. Once agent_report appears in a step, the SDK stops the stream and emits stream-end with all parts from that final step. The only way to hit the window you described is if the process is killed or the stream is forcibly aborted in the narrow gap between the SDK finishing the step and emitting the end event — this is an extremely unlikely race that would require external intervention at exactly the wrong millisecond.

The previous code that handled this (abort-path recovery via handleTaskTermination/loadTerminationParts with message-id history reconstruction) added ~100 LoC of complexity for a scenario that is near-impossible in practice. We explicitly decided to remove it in favor of a simpler model where stream-end is the single source of truth for report finalization.

If this edge case ever does occur, the child task remains in running status and is recoverable on next app restart — not a data loss scenario.

Please look for other issues in the PR instead of this intentional design decision.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: abd87128c3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ethanndickson
Copy link
Member Author

@codex review

Addressed the deadlock scenario you pointed out:

  • Root cause: autonomous stopWhen used hasToolCall("agent_report"), which stops on any call (including failed agent_report when descendants are still queued/running).
  • Failure mode: stream could end early, handleStreamEnd() exits on active descendants, task stays running but idle, and low parallelism (e.g. maxParallelAgentTasks=1) can starve queued descendants.

Fix in this push (b411d686b):

  1. StreamManager.createStopWhenCondition() now stops only after a successful agent_report tool result is present in the last step (toolResults), not on call-only.
  2. Updated stopWhen tests to enforce:
    • true for successful toolResults: agent_report
    • false for toolCalls: agent_report without result
    • false for non-agent_report results / empty steps
  3. Updated stale comments in taskService.ts and agent_report.ts to match the new semantics.

Please re-review this commit and look for any remaining issues.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b411d686ba

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ethanndickson
Copy link
Member Author

@codex review

Follow-up for the P1 you raised ("check agent_report success before ending autonomous stream"):

  • Previous fix still stopped on any agent_report entry in toolResults.
  • This commit tightens stopWhen to require toolName === "agent_report" and output.success === true.
  • Failed agent_report results (success: false) no longer terminate the stream, so the model can recover in the same run.

What changed in 4c25cb438:

  1. StreamManager.createStopWhenCondition() now gates stop on agent_report with output.success === true.
  2. streamManager stopWhen tests now cover:
    • success true => stop
    • success false => do not stop
    • call-only (no result) => do not stop
    • non-agent_report result => do not stop
  3. Updated related comments in taskService.ts and agent_report.ts to reflect success-gated behavior.

Please re-review for any remaining issues.

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Swish!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

ethanndickson and others added 2 commits February 12, 2026 12:07
Prevent deadlock when agent_report is called before descendant tasks finish.

- change StreamManager autonomous stopWhen to check last-step toolResults for a successful agent_report result
- keep failed agent_report calls in-stream so the model can recover instead of ending in idle-running state
- update streamManager stopWhen tests to cover success-result vs call-only behavior
- refresh stale comments that referenced old hasToolCall semantics

---

_Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `4.85`_

<!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=34.85 -->
Tighten autonomous stopWhen semantics so failed agent_report results do not terminate the stream.

- require agent_report tool result output.success === true before stopWhen ends autonomous loops
- keep failed agent_report attempts in-stream so the model can recover within the same run
- extend stopWhen tests to cover success:true vs success:false result cases
- refresh related comments to reflect success-gated behavior

---

_Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `4.85`_

<!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=34.85 -->
@ethanndickson ethanndickson added this pull request to the merge queue Feb 12, 2026
Merged via the queue into main with commit 34745f3 Feb 12, 2026
23 checks passed
@ethanndickson ethanndickson deleted the cost-tracking-44ax branch February 12, 2026 02:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant