Skip to content

Bug: Task app state reporting is inconsistent — self-reported failure/idle/complete states are silently overridden when AgentAPI is enabled #1350

@blinkagent

Description

@blinkagent

Summary

When the AI AgentAPI (CODER_MCP_AI_AGENTAPI_URL) is configured, the MCP server's WithTaskReporter callback unconditionally overrides all self-reported states from the AI agent to working. This means failure, idle, and complete states reported by the agent via the coder_report_task tool are silently discarded. The system then relies entirely on the screen watcher (SSE subscription to agentapi) to detect StatusStable and report idle. If the screen watcher fails or never fires, the task state gets stuck — either as working forever or as null (never reported).

This results in tasks where:

  • current_state is permanently null even though the task is active and waiting for input
  • current_state is stuck on working even though the agent has finished
  • The POST /tasks/{user}/{task}/send endpoint returns 502 because agentapi's GetStatus() reports running instead of stable
  • Terminal states (failure, complete) are unreachable when AgentAPI is enabled

Root Cause

In cli/exp_mcp.go ~L696-706:

toolsdk.WithTaskReporter(func(args toolsdk.ReportTaskArgs) error {
    // The agent does not reliably report its status correctly.  If AgentAPI
    // is enabled, we will always set the status to "working" when we get an
    // MCP message, and rely on the screen watcher to eventually catch the
    // idle state.
    state := codersdk.WorkspaceAppStatusStateWorking
    if s.aiAgentAPIClient == nil {
        state = codersdk.WorkspaceAppStatusState(args.State)
    }
    ...

When aiAgentAPIClient != nil, every self-reported state (including failure, idle, complete) is overridden to working. The design intention is to distrust the agent's idle reporting and rely on the screen watcher instead, but the override is too broad.

Contributing Factors

1. Screen watcher is the only path to idle (when AgentAPI is enabled)

The startWatcher goroutine (L614-663) subscribes to agentapi SSE events and maps StatusStableidle and StatusRunningworking. If the SSE subscription fails (L617-619), the goroutine returns early and idle is never reported.

2. No periodic fallback/polling

The screen watcher is entirely event-driven via SSE. There is no periodic poll of GetStatus() as a fallback if the SSE connection drops or never connects.

3. Terminal states are impossible

complete and failure can only come from agent self-reports (the screen watcher only knows running/stable). Since all self-reports are overridden to working when AgentAPI is enabled, these terminal states are unreachable.

4. Queue predicate filters duplicate working updates

The queue predicate at ~L452-455 discards non-user-message working updates from the screen watcher after the first report. So if the agent self-reports failure (converted to working), then the watcher reports working, it gets discarded as a duplicate. Then when the watcher finally reports idle, it should go through — but only if the watcher is running at all.

5. Misleading unconditional log line

At L541, cliui.Infof(inv.Stderr, "Failed to watch screen events") is printed unconditionally (not inside an error handler). This is misleading for debugging but doesn't affect behavior.

6. Send endpoint gates on agentapi status, not task current_state

The POST /tasks/{user}/{task}/send endpoint in coderd/aitasks.go ~L766 calls agentAPIClient.GetStatus() directly and requires StatusStable. This is independent of the task's current_state field, so even if current_state were correctly set to idle, the send can still fail if agentapi disagrees.

Observed Behavior

  • Tasks created with trivial prompts (e.g. "Tests") get failure from the agent, which is silently converted to working, and then current_state stays null or working forever
  • Tasks doing real work (analysis, code review) also end up with null current_state — the screen watcher either isn't connecting or isn't emitting StatusStable
  • Sending input to a task in this state fails with 502: Task app is not ready to accept input. Status: running

Suggested Fixes

  1. Allow terminal states through the override: The WithTaskReporter should only override idleworking when AgentAPI is enabled, and should pass failure and complete through as-is from agent self-reports
  2. Add periodic status polling as fallback: If the SSE connection to agentapi drops, periodically poll GetStatus() to catch StatusStableidle
  3. Fix the unconditional log line: The "Failed to watch screen events" message at L541 should only print on actual failure
  4. Consider allowing send when current_state is null: If the task is active and current_state is null (no state has been reported yet), it may be reasonable to attempt the send rather than blocking

Created on behalf of @mafredri

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions