fix: reset Codex websocket auth pin after quota errors#2966
Conversation
|
This pull request targeted The base branch has been automatically changed to |
There was a problem hiding this comment.
Code Review
This pull request improves the Codex websocket executor and OpenAI response handlers by implementing more robust connection management and authentication state handling. Key changes include the introduction of thread-safe session state snapshots, logic to rebuild connections upon authentication switching, and a mechanism to reset authentication pins when encountering rate limits or authorization errors. Review feedback highlighted a potential issue where non-blocking sends in trySendCodexWebsocketRead could cause silent message loss and block consumers, as well as the non-idiomatic use of recover() for managing channel closures.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2c472e0d10
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 58e7228935
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // 该方法仅持有 activeMu 调用避免与 connMu->activeMu 锁序冲突 | ||
| // 不要在持有 connMu 时调用避免未来引入反向锁序 |
There was a problem hiding this comment.
Replace new non-English comments with English text
The newly added comments here are in Chinese, which violates the repository rule in AGENTS.md (scope: repo root) that comments must be English-only and that edited non-English comments should be translated. Keeping these comments non-English makes future maintenance harder for contributors who follow the project’s documented language standard.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in d8cfb88e. I translated the newly added comments in internal/runtime/executor/codex_websockets_executor.go to English so they match the repository AGENTS.md rule.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d8cfb88ee7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| nextSessionRequestSnapshot := updatedLastRequest | ||
| if shouldBuildResponsesWebsocketFullSnapshot(requestJSON, allowIncrementalInputWithPreviousResponseID) { |
There was a problem hiding this comment.
Keep full snapshot for incremental prewarm turns
In ResponsesWebsocket, a full nextSessionRequestSnapshot is computed for incremental requests, but the generate:false prewarm branch still persists updatedLastRequest (the incremental payload with previous_response_id) instead of that full snapshot. This becomes a regression after the new auth-reset flow: if a later turn returns 429/401 and forceDisableIncrementalAfterAuthReset is enabled, normalization rebuilds from this truncated lastRequest and can drop earlier conversation/tool-call context. The prewarm path should preserve the same full snapshot used by normal successful turns.
Useful? React with 👍 / 👎.
luispater
left a comment
There was a problem hiding this comment.
Thanks for the fix — this addresses the “sticky pinned auth after 429/terminal auth errors” behavior in Codex Responses websockets and adds solid regression coverage.
Summary
- Clears pinned auth and closes the upstream execution session after terminal quota/auth errors (429/401/402/403) so the next request can reselect credentials.
- Rebuilds the upstream Codex websocket when the selected auth changes, and hardens session-close/read-loop signaling to avoid stale reader loops touching the current request channel.
- Persists a full downstream snapshot for incremental
previous_response_idflows so auth resets don’t lose state.
Blocking
- None.
Non-blocking / Suggestions
trySendCodexWebsocketReadnow blocks until the channel drains (oractiveDonefires). The name “trySend” reads as non-blocking; consider renaming or adding a short comment to make the blocking semantics explicit.shouldResetResponsesWebsocketAuthPincurrently keys off status code only. If you ever need finer granularity (e.g. distinguish quota exhaustion vs transient rate-limit), consider threading througherror.codeas well.
Test plan
- Not run locally (review-only).
- CI
buildpassed and the added tests cover: auth pin reset after 429, auth switch forcing websocket rebuild, and session close unblocking active reads.
This is an automated Codex review result and still requires manual verification by a human reviewer.
Summary
Why
Same-team multi-account Codex setups can look like round-robin is broken after one credential hits 429, because the downstream websocket session keeps reusing the previously pinned auth and its upstream websocket connection. This patch forces a clean auth re-selection path after terminal quota/auth failures.
Test