Skip to content

fix: reset Codex websocket auth pin after quota errors#2966

Open
orime wants to merge 3 commits into
router-for-me:devfrom
orime:wsfix-v6930
Open

fix: reset Codex websocket auth pin after quota errors#2966
orime wants to merge 3 commits into
router-for-me:devfrom
orime:wsfix-v6930

Conversation

@orime
Copy link
Copy Markdown

@orime orime commented Apr 23, 2026

Summary

  • clear pinned Codex websocket auth after terminal quota/auth errors so the next turn can reselect credentials
  • rebuild the upstream websocket when the selected auth changes instead of reusing the old connection
  • preserve a full downstream snapshot after auth reset and add regression coverage for auth switch/session close behavior

Why

Same-team multi-account Codex setups can look like round-robin is broken after one credential hits 429, because the downstream websocket session keeps reusing the previously pinned auth and its upstream websocket connection. This patch forces a clean auth re-selection path after terminal quota/auth failures.

Test

  • go test ./sdk/api/handlers/openai
  • go test ./internal/runtime/executor
  • go build ./...

@github-actions github-actions Bot changed the base branch from main to dev April 23, 2026 02:32
@github-actions
Copy link
Copy Markdown

This pull request targeted main.

The base branch has been automatically changed to dev.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves the Codex websocket executor and OpenAI response handlers by implementing more robust connection management and authentication state handling. Key changes include the introduction of thread-safe session state snapshots, logic to rebuild connections upon authentication switching, and a mechanism to reset authentication pins when encountering rate limits or authorization errors. Review feedback highlighted a potential issue where non-blocking sends in trySendCodexWebsocketRead could cause silent message loss and block consumers, as well as the non-idiomatic use of recover() for managing channel closures.

Comment thread internal/runtime/executor/codex_websockets_executor.go
Comment thread internal/runtime/executor/codex_websockets_executor.go Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2c472e0d10

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/runtime/executor/codex_websockets_executor.go Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 58e7228935

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +109 to +110
// 该方法仅持有 activeMu 调用避免与 connMu->activeMu 锁序冲突
// 不要在持有 connMu 时调用避免未来引入反向锁序
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Replace new non-English comments with English text

The newly added comments here are in Chinese, which violates the repository rule in AGENTS.md (scope: repo root) that comments must be English-only and that edited non-English comments should be translated. Keeping these comments non-English makes future maintenance harder for contributors who follow the project’s documented language standard.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d8cfb88e. I translated the newly added comments in internal/runtime/executor/codex_websockets_executor.go to English so they match the repository AGENTS.md rule.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d8cfb88ee7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +153 to +154
nextSessionRequestSnapshot := updatedLastRequest
if shouldBuildResponsesWebsocketFullSnapshot(requestJSON, allowIncrementalInputWithPreviousResponseID) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep full snapshot for incremental prewarm turns

In ResponsesWebsocket, a full nextSessionRequestSnapshot is computed for incremental requests, but the generate:false prewarm branch still persists updatedLastRequest (the incremental payload with previous_response_id) instead of that full snapshot. This becomes a regression after the new auth-reset flow: if a later turn returns 429/401 and forceDisableIncrementalAfterAuthReset is enabled, normalization rebuilds from this truncated lastRequest and can drop earlier conversation/tool-call context. The prewarm path should preserve the same full snapshot used by normal successful turns.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator

@luispater luispater left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix — this addresses the “sticky pinned auth after 429/terminal auth errors” behavior in Codex Responses websockets and adds solid regression coverage.

Summary

  • Clears pinned auth and closes the upstream execution session after terminal quota/auth errors (429/401/402/403) so the next request can reselect credentials.
  • Rebuilds the upstream Codex websocket when the selected auth changes, and hardens session-close/read-loop signaling to avoid stale reader loops touching the current request channel.
  • Persists a full downstream snapshot for incremental previous_response_id flows so auth resets don’t lose state.

Blocking

  • None.

Non-blocking / Suggestions

  • trySendCodexWebsocketRead now blocks until the channel drains (or activeDone fires). The name “trySend” reads as non-blocking; consider renaming or adding a short comment to make the blocking semantics explicit.
  • shouldResetResponsesWebsocketAuthPin currently keys off status code only. If you ever need finer granularity (e.g. distinguish quota exhaustion vs transient rate-limit), consider threading through error.code as well.

Test plan

  • Not run locally (review-only).
  • CI build passed and the added tests cover: auth pin reset after 429, auth switch forcing websocket rebuild, and session close unblocking active reads.

This is an automated Codex review result and still requires manual verification by a human reviewer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants