Skip to content

fix(web-client): persist in-flight chat sends so reloads don't drop replies#1856

Open
john-the-dev wants to merge 3 commits into
sonichi:mainfrom
john-the-dev:fix/web-chat-reply-loss-persist
Open

fix(web-client): persist in-flight chat sends so reloads don't drop replies#1856
john-the-dev wants to merge 3 commits into
sonichi:mainfrom
john-the-dev:fix/web-chat-reply-loss-persist

Conversation

@john-the-dev

@john-the-dev john-the-dev commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Summary

When voice is disconnected, web-chat text routes through the task bridge and polls GET /result/<id> for the late reply. Two failure modes left the reply in the Tasks tab but never in the chat transcript:

  1. Reload during the wait dropped the poll (it lived in an in-page timer), so a refresh lost the reply forever.
  2. Long agent tasks (PR creation, research, multi-step analysis) exceeded the 5-min poll cap. On give-up the poll deleted the persisted entry and left a dead (No response yet…) line — even a reload couldn't recover it.

Both are fixed here. Separate from PR #1855 (/answer for pending questions).

What changed

  • Persist in-flight dashboard text sends in localStorage and resume polling on page load.
  • Move dashboard resume after pending-chat constants initialize; the earlier placement could throw during load and silently skip recovery.
  • Use a dashboard-specific pending key so dashboard and /chat do not claim each other's in-flight replies.
  • Add equivalent in-flight reply recovery to the dedicated /chat page.
  • Tolerate long tasks: raise the dashboard poll ceiling to 30 min, back the cadence off after a 2-min fast window, GC persisted sends older than 30 min on load, and on the hard ceiling keep the persisted entry so a reload re-attaches and still renders the late reply (/result serves archived results indefinitely).
  • Regression tests cover both surfaces and the no-orphan behavior.

Test plan

  • npm run typecheck
  • npx tsx --test tests/web-ui-chat-pending-recovery.test.ts — 3 passing
  • git diff --check
  • Live Playwright reload recovery test with system Chrome (dashboard / and /chat): mock /task, keep /result/<id> pending, reload, confirm user text + working placeholder recover, flip result to completed, confirm assistant reply renders and pending key clears

Notes

No backend changes. GET /result/<id> already serves archived results, so this is front-end recovery for the late reply path.

…eplies

The voice-disconnected text path posts to the task bridge and polls /result
for the late reply (core pickup is 10-32s). The poll lived in an in-page
setInterval closure that died on page reload, so a refresh during the wait
dropped the reply forever and read as "no response".

Persist {task_id, text} to localStorage on send and resume polling on page
load, so a reload re-attaches and renders the reply (/result/<id> serves from
results/archive too, so the reply survives the bridge archiving the file).
Also show a "working…" placeholder bubble immediately on send so the wait
reads as in-progress, not failure.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@bassilkhilo-ag2 bassilkhilo-ag2 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Correct fix for the reload-drops-reply bug. 30-min cutoff on loadPendingTasks() prevents zombie polls. DOMPurify + marked used on render — no raw innerHTML from agent results. addPendingChatSend/removePendingChatSend dedup by task_id correctly. Approve.

@bassilkhilo-ag2 bassilkhilo-ag2 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct approach. The PENDING_KEY localStorage entry with a 30-minute cutoff gives the page exactly what it needs to resume polling after a reload — task IDs that were mid-flight. The dedup guard in appendPendingMessage() prevents double-rendering when the page loads and also the poll restores the same bubble. forgetPendingTask() cleans both the localStorage entry and the interval. The cutoff filter on loadPendingTasks() prevents stale entries from tasks that completed or timed out before the reload. ✓

@bassilkhilo-ag2 bassilkhilo-ag2 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good persistence design. localStorage-backed pending tasks with 30-min TTL and polling-to-recover-on-reload. Dedup via data-pending-task prevents double spinners. forgetPendingTask clears the interval to prevent memory leaks. Clean UX improvement. LGTM.

The dashboard send-path poll capped at 5 min and, on give-up, deleted the
persisted entry and left a dead "(No response yet…)" line in the transcript.
Long agent tasks (PR creation, research) routinely exceed that, so the reply
landed in the Tasks tab but never in chat — even a reload couldn't recover it
because the localStorage entry was already gone.

Raise the ceiling to 30 min, back the poll cadence off after a 2-min fast
window, GC persisted sends older than 30 min on load, and on the hard ceiling
keep the persisted entry (a reload re-attaches and still renders the late
reply, since /result serves archived results indefinitely).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

@cla-assistant check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants