🤖 fix: preserve subagent usage after agent_report#2351
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c93a78f397
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Addressed the P1 race by moving exec patch generation + cleanup into stream termination handling with lock-serialized cleanup rechecks. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b3facf8a82
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7c6b7e2118
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 700ef25d25
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c536587a48
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: abd87128c3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Re: your previous finding about the timing window between
The previous code that handled this (abort-path recovery via If this edge case ever does occur, the child task remains in Please look for other issues in the PR instead of this intentional design decision. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: abd87128c3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Addressed the deadlock scenario you pointed out:
Fix in this push (
Please re-review this commit and look for any remaining issues. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b411d686ba
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Follow-up for the P1 you raised ("check agent_report success before ending autonomous stream"):
What changed in
Please re-review for any remaining issues. |
|
Codex Review: Didn't find any major issues. Swish! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
… lifecycle Move all report finalization from tool-call-end to handleStreamEnd, solving two bugs: - P1: patch artifact now created before waiters resolve - P2: task not marked reported until stream completes (no intra-step race) Remove handleAgentReport, readLatestAgentReportArgs, findAgentReportArgsInMessage.
This reverts commit 11fc863.
Prevent deadlock when agent_report is called before descendant tasks finish. - change StreamManager autonomous stopWhen to check last-step toolResults for a successful agent_report result - keep failed agent_report calls in-stream so the model can recover instead of ending in idle-running state - update streamManager stopWhen tests to cover success-result vs call-only behavior - refresh stale comments that referenced old hasToolCall semantics --- _Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `4.85`_ <!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=34.85 -->
Tighten autonomous stopWhen semantics so failed agent_report results do not terminate the stream. - require agent_report tool result output.success === true before stopWhen ends autonomous loops - keep failed agent_report attempts in-stream so the model can recover within the same run - extend stopWhen tests to cover success:true vs success:false result cases - refresh related comments to reflect success-gated behavior --- _Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `4.85`_ <!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=34.85 -->
4c25cb4 to
b3f8d17
Compare
Summary
Subagent costs were disappearing from the Costs tab because
agent_reporthandling aborted the child stream before AI SDK usage was fully recorded. This PR replaces the oldtool-call-endabort-based lifecycle with astopWhen-driven model whereagent_reportends the stream naturally at the step boundary, andstream-endis the single source of truth for report finalization and cleanup.Background
Three related bugs drove this work:
handleAgentReport(triggered ontool-call-end) calledstopStreamto abort the child, cutting off AI SDK usage accounting for the final step.cleanupReportedLeafTaskusedhasActiveDescendantAgentTasksto gate parent deletion, but that predicate ignoresreportedchildren. A reported parent with two reported children could delete itself when the first child cleaned up, orphaning the second.agent_report, the stream could continue into another step and execute more tool calls even though the task was already reported and the parent resumed.Implementation
agent_reporttool{ success: true, message: "Report submitted successfully." }— no side-effects, no stream abort.Stream manager
hasToolCall("agent_report")to the autonomousstopWhencondition array. The stream ends at the first step boundary containingagent_report— natural completion preserves usage accounting while preventing post-report tool execution.deletePartial/updateHistoryreturn values at stream-end are now checked and logged on failure.Task report finalization (
taskService.ts)handleAgentReport(thetool-call-endlistener),readLatestAgentReportArgs,findAgentReportArgsInMessage, and thestopStream/commitPartialblock insidefinalizeAgentTaskReport. Report args are now read directly fromevent.partsinhandleStreamEndviafindAgentReportArgsInParts.stream-abortlistener,handleStreamAbort,isStreamAbortEvent—stream-endis the only terminal event TaskService listens to.finalizeAgentTaskReportis purely logical (status + report delivery + waiter resolution). Cleanup progression is driven byfinalizeTerminationPhaseForReportedTask, called fromhandleStreamEndafter finalization.canCleanupReportedTaskpredicate: checkstaskStatus === "reported",!isStreaming, structural-leaf topology, and no pending patch artifact.maybeStartPatchGenerationForReportedTask), so an immediatetask_awaitresult afteragent_reportcan include a pending artifact record.Structural-leaf cleanup ordering
hasChildAgentTasks(index, workspaceId)— a topology predicate that checks whether a workspace has any child agent-task nodes in config, regardless of their status.hasActiveDescendantAgentTaskscall in cleanup with the structural-leaf check. A reported task can only be deleted when it has zero children in config.Cleanup
isTypedWorkspaceEventhelper to deduplicate type-guard functions.ToolCallEndEventimport from taskService (no longer used).Validation
make static-check— all checks passbun test src/node/services/taskService.test.ts— 36 passbun test src/node/services/tools/task_await.test.ts— 8 passbun test src/node/services/streamManager.test.ts— stopWhen tests passRisks
cleanupReportedLeafTask, and tests covering bothagent_report→handleStreamEndflow and structural-leaf ordering.stopWhenscope:hasToolCall("agent_report")only applies to autonomous mode (nottoolChoicemode which already usesstepCountIs(1)). The condition is additive and cannot affect streams that don't invokeagent_report.Generated with
mux• Model:anthropic:claude-opus-4-6• Thinking:xhigh• Cost:$14.32