Fix WebSocket issues with Codex#701
Conversation
📝 WalkthroughWalkthroughThis PR refactors WebSocket connection management with connection ID tracking for stale socket invalidation, overhauls Codex session lifecycle to use temporary pending IDs with rekeying upon real thread receipt, upgrades build tools to Bun, bumps dependency versions, updates default Codex model to gpt-5.5, improves message deduplication detection, and adjusts session ID fallback logic for provider-specific behavior. Changes
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
package.json (1)
27-46:⚠️ Potential issue | 🟠 MajorScripts now hard-require Bun;
prepublishOnlyis left inconsistent.Switching
dev,build, andstarttobun runadds Bun as a required local tool and is unrelated to the stated PR scope (WebSocket + Codex session robustness). Anyone followingCONTRIBUTING.md'snpm run buildinstruction will get "bun: command not found" unless they also install Bun, and CI that only provisions Node will break.Two further concerns:
- Line 42
"prepublishOnly": "npm run build"still uses npm, so publishing will detour back through npm while dev/build runs through bun — pick one runner or document why they differ.- The commit history / PR description should explicitly call this out (plus any CI/docs updates), or split the toolchain migration into its own PR to keep the scope focused.
As per coding guidelines,
Contributing.mdasks contributors to "keep PR scope focused … ensure the build passes (run npm run build), and avoid unrelated changes".🛠 Option A — revert script runner to keep scope tight
- "dev": "concurrently --kill-others \"bun run server:dev\" \"bun run client\"", + "dev": "concurrently --kill-others \"npm run server:dev\" \"npm run client\"", @@ - "build": "bun run build:client && bun run build:server", + "build": "npm run build:client && npm run build:server", @@ - "start": "bun run build && bun run server", + "start": "npm run build && npm run server",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@package.json` around lines 27 - 46, The package.json scripts were changed to hard-require Bun (e.g. "dev", "build", "start" use "bun run ...") while "prepublishOnly" still uses npm; revert script runner changes so the repo stays Node/npm-first: update "dev", "build", "start" (and any commands that call bun like "build:client"/"build:server" invocations) to use the original npm/node invocations (e.g., "npm run server:dev" or direct CLI commands like "vite"/"tsc" as previously used) and make "prepublishOnly" consistent with that choice; keep this change focused to these script lines only and do not introduce CI/docs edits in this PR (raise a follow-up PR to migrate to Bun and update CONTRIBUTING/CI if you intend to change the toolchain).server/openai-codex.js (1)
271-319:⚠️ Potential issue | 🟡 MinorPending session id can leak into
complete/errorifthread.startednever fires.If the turn reaches line 319 (or the
catchat 329) without the loop ever hittingevent.type === 'thread.started' && event.thread_id,currentSessionIdstays ascodex-pending-<ts>and that pending id is what gets emitted in thecomplete/errorNormalizedMessage. The client never receivedsession_createdfor that id, so it cannot correlate — the processing banner/abort state for that session can be left hanging.Root causes to consider: a
turn.failedbeforethread.started, anitem.*event arriving first withthread.startedlackingthread_id, or an SDK error before any event. Even if unlikely in practice, the silent fallback is worth guarding:🛡 Suggested guard
// Send completion event if (!terminalFailure) { - const finalSessionId = thread.id || currentSessionId; + const finalSessionId = thread.id || (sessionCreatedSent ? currentSessionId : null); + if (!finalSessionId) { + console.warn('[Codex] Completing without a resolved thread id; suppressing complete emission'); + } else { sendMessage(ws, createNormalizedMessage({ kind: 'complete', actualSessionId: finalSessionId, sessionId: finalSessionId, provider: 'codex' })); + }Similarly,
erroremissions in thecatchat line 347 currently sendsessionId: currentSessionId, which may still be the pending id — consider conditionally omitting the id when!sessionCreatedSent.Also a defensive log would help diagnose the "never saw
thread.started" case:} finally { // Update session status if (currentSessionId) { + if (!sessionCreatedSent && currentSessionId.startsWith('codex-pending-')) { + console.warn(`[Codex] Session ${currentSessionId} finished without a thread.started event`); + } const session = activeCodexSessions.get(currentSessionId);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/openai-codex.js` around lines 271 - 319, The currentSessionId (e.g., "codex-pending-<ts>") can be emitted in final 'complete'/'error' NormalizedMessage when thread.started never fires; update the completion and catch/error paths (the code that calls sendMessage(createNormalizedMessage(...)) for kind:'complete' and the error emission in the catch) to only include sessionId/actualSessionId when sessionCreatedSent is true (or when thread.id is available), otherwise omit the sessionId (or set it to null) so clients don't receive an uncorrelated pending id; additionally add a defensive log message (using the existing logger or processLogger) when exiting without sessionCreatedSent to help diagnose missing thread.started events.
🧹 Nitpick comments (2)
server/openai-codex.js (1)
429-438: LGTM — writer swap is defensive and minimal.The
session?.writer?.updateWebSocketoptional-chain guard gracefully no-ops when the session was originally started via an SSE writer or a rawwsthat doesn't implementupdateWebSocket, so the reconnect path can't throw on a transport it doesn't understand. Consistent withreconnectSessionWriterinclaude-sdk.js.One tiny optional tweak: return
falsewith a debug log when the session exists but the writer lacksupdateWebSocket, so operators can distinguish "no such session" from "session exists, writer not swappable":♻️ Optional diagnostic log
export function reconnectCodexSessionWriter(sessionId, newRawWs) { const session = activeCodexSessions.get(sessionId); - if (!session?.writer?.updateWebSocket) { + if (!session) return false; + if (!session.writer?.updateWebSocket) { + console.warn(`[RECONNECT] Codex session ${sessionId} has no swappable writer`); return false; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/openai-codex.js` around lines 429 - 438, In reconnectCodexSessionWriter, add a diagnostic path that distinguishes "no such session" from "session exists but writer not swappable": after retrieving session via activeCodexSessions.get(sessionId), if session is falsy log a debug/notice and return false as today, but if session is present and session.writer exists but lacks updateWebSocket, emit a debug log like "session exists but writer has no updateWebSocket" (include sessionId) and return false; otherwise proceed to call session.writer.updateWebSocket(newRawWs) and log the successful swap. Ensure you reference reconnectCodexSessionWriter, activeCodexSessions, session.writer, and updateWebSocket when locating the code to modify.src/contexts/WebSocketContext.tsx (1)
98-129: Optional: two overlapping cleanup effects could be consolidated.Both effects tear down
wsRef.current, clearreconnectTimeoutRef.current, and bumpconnectionIdRef.current. The functional split (effect 1 = token-driven re-connect, effect 2 = unmount flag) works, but future editors are likely to change one and forget the other. A single effect that tracks mount state and also re-runs on token change is easier to reason about:♻️ Suggested consolidation
- useEffect(() => { - unmountedRef.current = false; - connect(); - - return () => { - connectionIdRef.current += 1; - if (reconnectTimeoutRef.current) { - clearTimeout(reconnectTimeoutRef.current); - reconnectTimeoutRef.current = null; - } - if (wsRef.current) { - wsRef.current.close(); - wsRef.current = null; - } - setIsConnected(false); - }; - }, [connect]); // everytime token changes, we reconnect - - useEffect(() => { - return () => { - unmountedRef.current = true; - connectionIdRef.current += 1; - if (reconnectTimeoutRef.current) { - clearTimeout(reconnectTimeoutRef.current); - reconnectTimeoutRef.current = null; - } - if (wsRef.current) { - wsRef.current.close(); - wsRef.current = null; - } - }; - }, []); + useEffect(() => { + unmountedRef.current = false; + connect(); + return () => { + // Set unmountedRef only for the true unmount; React will run this cleanup + // on every token change too, but `connect()` below immediately resets it. + unmountedRef.current = true; + connectionIdRef.current += 1; + if (reconnectTimeoutRef.current) { + clearTimeout(reconnectTimeoutRef.current); + reconnectTimeoutRef.current = null; + } + if (wsRef.current) { + wsRef.current.close(); + wsRef.current = null; + } + setIsConnected(false); + }; + }, [connect]);Also, the comment on line 114 (
// everytime token changes, we reconnect) is a duplicate of the one on line 96 and no longer accurately describes the effect since it also owns unmount teardown — consider rewording.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/contexts/WebSocketContext.tsx` around lines 98 - 129, Consolidate the two cleanup useEffects into one: have a single useEffect that depends on [connect] which sets unmountedRef.current = false, calls connect(), and returns a single teardown that sets unmountedRef.current = true, increments connectionIdRef.current, clears and nulls reconnectTimeoutRef.current, closes and nulls wsRef.current, and calls setIsConnected(false) as needed; update or remove the duplicate comment about token changes so it accurately describes that this effect both handles reconnects when connect changes and the unmount teardown.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/stores/useSessionStore.ts`:
- Around line 109-131: The current isDuplicateOptimisticUserMessage hides any
optimistic local user message whose normalized content matches any server user
message; change the matching to only consider server messages that are newer
than the optimistic message (i.e., compare timestamps) so old historical
duplicates don't mask new optimistic sends. Specifically, update
isDuplicateOptimisticUserMessage to: get a timestamp from realtimeMessage (e.g.,
realtimeMessage.createdAt or realtimeMessage.timestamp/clientTs) and only call
normalizeMessageContent(serverMessage.content) === realtimeContent for
serverMessage entries where serverMessage.createdAt (or serverMessage.timestamp)
>= realtimeMessage.createdAt; if your message shape uses a different timestamp
field name, use that name consistently. This preserves content normalization via
normalizeMessageContent but restricts comparisons to recent server messages so
duplicates are paired by recency.
---
Outside diff comments:
In `@package.json`:
- Around line 27-46: The package.json scripts were changed to hard-require Bun
(e.g. "dev", "build", "start" use "bun run ...") while "prepublishOnly" still
uses npm; revert script runner changes so the repo stays Node/npm-first: update
"dev", "build", "start" (and any commands that call bun like
"build:client"/"build:server" invocations) to use the original npm/node
invocations (e.g., "npm run server:dev" or direct CLI commands like "vite"/"tsc"
as previously used) and make "prepublishOnly" consistent with that choice; keep
this change focused to these script lines only and do not introduce CI/docs
edits in this PR (raise a follow-up PR to migrate to Bun and update
CONTRIBUTING/CI if you intend to change the toolchain).
In `@server/openai-codex.js`:
- Around line 271-319: The currentSessionId (e.g., "codex-pending-<ts>") can be
emitted in final 'complete'/'error' NormalizedMessage when thread.started never
fires; update the completion and catch/error paths (the code that calls
sendMessage(createNormalizedMessage(...)) for kind:'complete' and the error
emission in the catch) to only include sessionId/actualSessionId when
sessionCreatedSent is true (or when thread.id is available), otherwise omit the
sessionId (or set it to null) so clients don't receive an uncorrelated pending
id; additionally add a defensive log message (using the existing logger or
processLogger) when exiting without sessionCreatedSent to help diagnose missing
thread.started events.
---
Nitpick comments:
In `@server/openai-codex.js`:
- Around line 429-438: In reconnectCodexSessionWriter, add a diagnostic path
that distinguishes "no such session" from "session exists but writer not
swappable": after retrieving session via activeCodexSessions.get(sessionId), if
session is falsy log a debug/notice and return false as today, but if session is
present and session.writer exists but lacks updateWebSocket, emit a debug log
like "session exists but writer has no updateWebSocket" (include sessionId) and
return false; otherwise proceed to call session.writer.updateWebSocket(newRawWs)
and log the successful swap. Ensure you reference reconnectCodexSessionWriter,
activeCodexSessions, session.writer, and updateWebSocket when locating the code
to modify.
In `@src/contexts/WebSocketContext.tsx`:
- Around line 98-129: Consolidate the two cleanup useEffects into one: have a
single useEffect that depends on [connect] which sets unmountedRef.current =
false, calls connect(), and returns a single teardown that sets
unmountedRef.current = true, increments connectionIdRef.current, clears and
nulls reconnectTimeoutRef.current, closes and nulls wsRef.current, and calls
setIsConnected(false) as needed; update or remove the duplicate comment about
token changes so it accurately describes that this effect both handles
reconnects when connect changes and the unmount teardown.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 872a0044-3695-4f4f-b871-abc7e0030b02
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (7)
package.jsonserver/index.jsserver/openai-codex.jsshared/modelConstants.jssrc/components/chat/hooks/useChatComposerState.tssrc/contexts/WebSocketContext.tsxsrc/stores/useSessionStore.ts
| function isDuplicateOptimisticUserMessage( | ||
| server: NormalizedMessage[], | ||
| realtimeMessage: NormalizedMessage, | ||
| ): boolean { | ||
| if ( | ||
| !realtimeMessage.id.startsWith('local_') || | ||
| realtimeMessage.kind !== 'text' || | ||
| realtimeMessage.role !== 'user' | ||
| ) { | ||
| return false; | ||
| } | ||
|
|
||
| const realtimeContent = normalizeMessageContent(realtimeMessage.content); | ||
| if (!realtimeContent) { | ||
| return false; | ||
| } | ||
|
|
||
| return server.some((serverMessage) => | ||
| serverMessage.kind === 'text' && | ||
| serverMessage.role === 'user' && | ||
| normalizeMessageContent(serverMessage.content) === realtimeContent, | ||
| ); | ||
| } |
There was a problem hiding this comment.
Content-only dedup can hide legitimate duplicate user messages.
isDuplicateOptimisticUserMessage matches any server user message with the same normalized content, regardless of order or recency. Two realistic scenarios where this misbehaves:
- User sends "hi" twice in a row. After the first
hipersists on the server but before the second does, both optimisticlocal_messages match the single serverhiand are hidden — the UI momentarily shows only onehi. - Much later in the conversation the user re-sends an identical prompt. Their optimistic echo is hidden immediately because a matching server message exists from earlier, even though the new one hasn't persisted yet.
Consider pairing by index (count server duplicates and only mask that many optimistic ones), or only matching against recent server messages (e.g., those newer than the optimistic message's timestamp), so unrelated prior duplicates can't mask new input.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/stores/useSessionStore.ts` around lines 109 - 131, The current
isDuplicateOptimisticUserMessage hides any optimistic local user message whose
normalized content matches any server user message; change the matching to only
consider server messages that are newer than the optimistic message (i.e.,
compare timestamps) so old historical duplicates don't mask new optimistic
sends. Specifically, update isDuplicateOptimisticUserMessage to: get a timestamp
from realtimeMessage (e.g., realtimeMessage.createdAt or
realtimeMessage.timestamp/clientTs) and only call
normalizeMessageContent(serverMessage.content) === realtimeContent for
serverMessage entries where serverMessage.createdAt (or serverMessage.timestamp)
>= realtimeMessage.createdAt; if your message shape uses a different timestamp
field name, use that name consistently. This preserves content normalization via
normalizeMessageContent but restricts comparisons to recent server messages so
duplicates are paired by recency.
Codex 5.5 generated code
This pull request introduces several improvements and fixes to the Codex chat experience, WebSocket reconnection logic, and session handling, along with dependency updates and a new Codex model. The main focus is on making Codex session management more robust, preventing duplicate messages, and ensuring the UI correctly reflects session state after reconnections or errors.
Summary by CodeRabbit
New Features
Bug Fixes
Chores