feat(access): per-api-key model allowlist with glob matching (v2, review fixes)#2914
feat(access): per-api-key model allowlist with glob matching (v2, review fixes)#2914mtuckerb wants to merge 13 commits into
Conversation
Adds enforcement for per-client-key model ACLs at the request middleware
layer. The dashboard already stores allowedModels per UserApiKey and
attempts to push them to PUT /v0/management/api-keys, but upstream until
now had nowhere to land them — the structured payload was rejected and
the dashboard's fallback silently dropped the policy data, so no key was
ever actually restricted.
config:
- Add APIKeyPolicy{Key, AllowedModels} and SDKConfig.APIKeyPolicies
- Add SDKConfig.APIKeyDefaultPolicy ("allow-all" | "deny-all"; defaults
to allow-all so existing deployments are unaffected)
- IsModelAllowedForKey supports path.Match globs and tolerates a single
leading "<prefix>/" segment so policies stay portable across prefixed
credentials
management API:
- PutAPIKeys now accepts both the legacy string list and the structured
[{key, allowedModels}] form, with allowedModels / allowed-models /
allowed_models all recognized so existing dashboard payloads work
- GetAPIKeys exposes api-key-policies when any are configured
middleware:
- New ModelACLMiddleware extracts the targeted model from the JSON body
("model" field) or the Gemini-style /v1beta/models/<model>:<action>
path, then enforces the per-key policy. The body is read once and
restored via io.NopCloser so downstream handlers see the original
request bytes
- Wired into both /v1 and /v1beta groups directly after AuthMiddleware
- Bypasses any request that has no identified api key (preserving the
legacy "no auth provider configured => allow" behavior of
AuthMiddleware) and any request shape with no extractable model
(listings, websocket upgrades) so discovery endpoints keep working
tests:
- Unit tests for IsModelAllowedForKey covering allow-all, deny-all,
empty allowed list semantics, exact matches, glob matches, and the
prefixed-credential case
- Middleware tests covering the allow / deny / unknown-key / Gemini
path / list-endpoint / body-pass-through paths
- Parser tests for the management PutAPIKeys payload covering the
legacy plain list, the wrapped items form, and the structured form
in all three case styles
Three changes landed after PR router-for-me#2911 review: middleware (DoS hardening): - Cap request body inspected by ModelACLMiddleware at 10 MiB via io.LimitReader; requests above the cap are rejected with HTTP 413 ("request_too_large") instead of being buffered into memory - Short-circuit on Content-Length when the client declares an oversized body, so we bail before reading any bytes - Any non-size read error now fails closed with HTTP 400 rather than silently allowing the request past the policy check - New tests cover both the streamed-oversize and Content-Length paths config (O(1) policy lookup + cache invalidation): - SDKConfig.APIKeyPolicies is now backed by an atomic.Pointer cache (map[string]*APIKeyPolicy) so IsModelAllowedForKey is O(1) per call instead of O(N) linear scan; the cache is lazily rebuilt via CompareAndSwap and safe under concurrent reads - Add SetAPIKeyPolicies (atomic replace + invalidate) and InvalidatePolicyIndex (manual flush after in-place edits) - Tests exercise mutation visibility, nil-safe receivers, concurrent access, and defensive-copy semantics management handlers (policy rotation/deletion): - PatchAPIKeys gains a dedicated handler that keeps APIKeyPolicies in sync on rename (via old/new) and rotation (via index/value); when the target key already has a policy the source row is dropped so there are no duplicate entries - DeleteAPIKeys removes the matching APIKeyPolicy rows so a revoked credential cannot leave its allowlist behind; slice is nilled when empty - PutAPIKeys routes through SetAPIKeyPolicies so the cache is refreshed atomically with the slice swap - Tests cover rename-with-existing-target, append-new, delete-by-value, delete-by-index, last-row-clears, and PUT-rebuilds-policies No behavior change for existing deployments that do not configure api-key-policies. All existing tests still pass.
There was a problem hiding this comment.
Code Review
This pull request introduces per-API-key model access control (ACL) enforcement. It adds a new ModelACLMiddleware that validates requested models against configured policies for both OpenAI and Gemini-style endpoints. The configuration system is updated to support structured API key policies with glob-based model matching and a lazy-loaded, atomic cache for efficient lookups. Management handlers for API keys are enhanced to keep these policies in sync during creation, rotation, and deletion. Feedback highlights potential data races in the management handlers due to missing synchronization, unsafe slice reuse that could lead to stale cache reads, and memory pressure concerns regarding the buffering of large request bodies in the ACL middleware.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b792cc2c49
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
- config_lists.go: PutAPIKeys, PatchAPIKeys, and DeleteAPIKeys now acquire h.mu before mutating h.cfg and use persistLocked so the mutation and the subsequent config write happen atomically from the perspective of readers (middleware goroutines). Previously the mutations raced with ModelACLMiddleware evaluating the same fields on concurrent requests. Matches the existing pattern in PutGeminiKeys / PatchGeminiKey. (HIGH review comment) - config_lists.go: renameAPIKeyPolicy no longer reslices the existing APIKeyPolicies backing array via "[:0]". That was unsafe because SDKConfig.policyIndex caches *APIKeyPolicy into the same backing array; a concurrent reader resolving the cache mid-write would observe a partially-mutated entry. Allocate a fresh slice so any still-cached pointers keep pointing at the pre-mutation entries until InvalidatePolicyIndex forces a refresh. (MEDIUM review comment) - model_acl.go: extractRequestedModel now peeks the first 16 KiB of the request body before committing to buffering the full cap. In realistic chat-completion payloads "model" is always near the top of the JSON, so the peek alone is usually sufficient and the middleware allocates ~16 KiB per request instead of up to modelACLMaxBodyBytes (10 MiB). When the peek does not contain "model", the middleware falls back to the prior bounded read and still enforces the cap. Under concurrency this is the difference between a bounded constant footprint and N*10 MiB. (MEDIUM review comment) - model_acl_test.go: two new tests lock in peek-first behavior — one covers a large body with "model" in the peek (must extract, enforce, and preserve the full downstream body byte-for-byte), the other covers "model" past the peek but within the cap (must still extract via the fallback read).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6bbea94c1d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
The previous extractor for /v1beta/models/<model>:<action> paths
truncated at the first '/' inside <model>. That broke deployments
using force-model-prefix, where the routed model identifier is
literally "<prefix>/<model>" (e.g. "teamA/gemini-3-pro"). The
GeminiHandler forwards the whole segment-before-":" as the model,
and IsModelAllowedForKey already strips a single leading
"<prefix>/" before glob matching, so the ACL extractor must
mirror that and forward the full identifier including embedded
slashes. Otherwise:
- allowed requests were rejected (the ACL saw "teamA" instead
of "teamA/gemini-3-pro")
- policies scoped to just the prefix segment could unintentionally
allow unrelated models sharing the prefix
Added TestModelACLMiddleware_GeminiPathPreservesPrefix which uses
the same AllowedModels pattern against both an allowed and a
disallowed model under the same prefix — a regression would
surface as a false-positive on either the allow path or the
deny path.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8d34ee9200
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Two P1s from the Codex reviewer on the prior commit:
1. Websocket ACL bypass
/v1/responses accepts a websocket upgrade, and the model
identifier is only selected later in frames consumed by the
ResponsesWebsocket handler. extractRequestedModel returns
found=false for the upgrade request, and the middleware was
falling through as "no model here, allow", letting a restricted
api key use any model through the websocket path.
Fix: after identifying the key but before attempting body
extraction, detect Upgrade: websocket and reject the upgrade
when keyHasModelRestriction(cfg, apiKey) is true. A key is
considered restricted when either it has an APIKeyPolicy entry
with a non-empty AllowedModels list, or the deployment default
is deny-all and the key has no matching policy. Unrestricted
keys still pass through so the legacy websocket flow keeps
working for them.
2. Unbounded drain after 413
After detecting the body exceeded modelACLMaxBodyBytes, the
code called io.Copy(io.Discard, c.Request.Body), which for a
chunked/streamed request without a trustworthy Content-Length
can hold the handler goroutine indefinitely — an attacker just
keeps streaming bytes and the ACL check turns into a
request-slot exhaustion path.
Fix: drop the drain entirely. Return 413 and let net/http
close the connection. No keep-alive concern since we're
rejecting the request.
Tests added:
- WebsocketUpgradeRejectedForRestrictedKey: key with
AllowedModels attempting WS upgrade gets 403, handler never
runs.
- WebsocketUpgradeAllowedForUnrestrictedKey: key with no policy
under allow-all default still gets through.
- WebsocketUpgradeRejectedUnderDenyAllDefault: deny-all default
rejects WS upgrades for keys with no matching policy.
- OversizedBodyDoesNotDrainRemainder: uses a reader that blocks
after its prefix to prove the middleware returns 413 promptly
(within 2s) instead of reading to EOF.
Two P2s from the Codex reviewer, both rooted in the same legacy
edge case: APIKeys can hold duplicate values, and the policy
helpers (renameAPIKeyPolicy, removeAPIKeyPolicy) are global —
they rewrite or drop the policy row by value, not by index. That
opened two ways for a duplicate to survive a patch/delete without
its intended restriction:
1. PatchAPIKeys old/new path
The loop broke at the first match. If APIKeys was
["sk-dup", "sk-other", "sk-dup"] and the caller did
{"old":"sk-dup","new":"sk-new"}, one slot remained "sk-dup"
while the policy migrated to "sk-new", so the leftover
"sk-dup" fell back to the default policy (usually allow-all).
Fix: remove the break — rename every duplicate so nothing is
left behind that the global policy migration would orphan.
2. PatchAPIKeys index/value path
Similar shape: rewriting one slot while the old value still
appears elsewhere, then moving the policy to the new value,
left the surviving duplicate unrestricted.
Fix: only migrate the policy when the old value no longer
appears anywhere in APIKeys. When duplicates remain, the
policy stays attached to the old value; the new value falls
back to the default until the caller sets an explicit policy
via PUT /api-keys.
3. DeleteAPIKeys indexed path
Same pattern — removed one slot, then dropped the policy row
unconditionally, which orphaned surviving duplicates.
Fix: only drop the policy when no other entries still carry
the removed value.
Added apiKeyValuePresent() helper so the three call sites share
one membership check.
Tests added:
- PatchAPIKeys_RenameViaOldNewRenamesAllDuplicates: three-slot
fixture with two duplicates proves all matching slots are
renamed and the migrated policy still applies.
- PatchAPIKeys_IndexedReplaceLeavesPolicyWhenDuplicateRemains:
two-duplicate fixture proves the surviving slot keeps its
policy (asserts both positively — still allows the permitted
model — and negatively — still rejects models outside the
allowlist).
- DeleteAPIKeys_ByIndexKeepsPolicyWhenDuplicateRemains: same
shape for the delete path.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e48e63fb12
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Codex/ChatGPT reasoning-model upstreams (the gpt-5.x family) emit assistant text only via response.output_text.delta events; the final response.completed event has no output item with content[].type == "output_text". The codex executor's non-streaming path filtered out every line except response.completed and passed it straight to the chat-completions translator, which then surfaced a message with content == null while completion_tokens was non-zero. Clients that read .choices[0].message.content saw the field as null and rendered fallback text such as "Something went wrong." even though the model had generated the requested response. Fix: walk the SSE body once, accumulate text from response.output_text.delta events, and synthesize a message item containing the accumulated text into response.output[] before passing the event to the translator. The accumulator is a no-op for non-reasoning models that already populate the message item themselves (codexResponseHasOutputText guards against double-injection). Tests cover codexResponseHasOutputText across the empty/missing/populated shapes, injectCodexOutputText for fresh and pre-populated output arrays, and a representative end-to-end gpt-5.x SSE body to verify the loop accumulates the deltas and produces the expected post-injection event. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
luispater
left a comment
There was a problem hiding this comment.
Summary
Adds per-API-key model allowlisting (glob matching) with management API support for structured allowedModels payloads, and enforces it via a new ModelACLMiddleware on /v1 and /v1beta.
Blocking
- Alias bypass: The middleware is wired on
/v1and/v1betaonly. The/backend-api/codex/*alias group also usesAuthMiddlewarebut is not protected, so restricted keys can bypass allowlists by calling the alias endpoints. - Body buffering on non-JSON:
extractRequestedModelreads and buffers the full request body for any POST/PUT/PATCH to look for"model". When ACLs are enabled, this will also buffer non-JSON/multipart requests likePOST /v1/images/edits, which can be large. Suggest gating body reads to JSON content-types (or a small allowlist of JSON routes), and/or extractingmodelfrom multipart form fields for image edit requests. - CI:
retargetis failing because a separate PR (#2911) from the same head branch cannot be retargeted todev(duplicate base/head). This likely needs the duplicate PR cleaned up so the check goes green.
Non-blocking
GET /v0/management/api-keysreturnsapi-key-policiesusingallowed-models(kebab-case). If the dashboard expectsallowedModels, consider adding an alias or documenting the response shape.- Minor nit: the test case name
"empty plain array fails"expects success (empty clears keys).
Test plan
- CI
build✅ - No local checkout/tests run (review-only via
ghas requested).
|
We actively need this feature. Per-key model allowlisting is exactly what we've been looking for to enforce user-tier access controls (e.g. restricting intern/contractor keys to deepseek-/glm-/qwen* while granting full-time employees broader access). The implementation looks solid — the middleware-based enforcement, glob matching, atomic policy cache, and backward-compatible payload formats all align well with how we'd want to consume this. That said, I'd echo the two blocking items from @luispater's review:
Both seem like small deltas on top of this PR. Hoping we can get these addressed and land this soon. |
Three blockers + two nits from luispater's CHANGES_REQUESTED review on PR router-for-me#2914: server.go (BLOCKING — alias bypass): - /backend-api/codex/* uses AuthMiddleware but had no ModelACLMiddleware, so a key restricted by api-key-policies could bypass the allowlist by routing through the codex alias instead of /v1/responses. Added ModelACLMiddleware to the codexDirect group. model_acl.go (BLOCKING — non-JSON body buffering): - extractRequestedModel previously buffered any POST/PUT/PATCH body up to 10 MiB to look for a JSON "model" field. With ACLs enabled this also buffered non-JSON uploads (image edits, future binary endpoints) even when no JSON model field could exist. - Dispatch on Content-Type: - application/json (and *+json, missing CT) → existing peek+buffer path with the 10 MiB cap. - multipart/form-data → ParseMultipartForm with a 1 MiB in-memory cap; large file parts spool to temp disk per stdlib. This recovers enforcement on /v1/images/edits without ballooning resident memory. - everything else → no body inspection, allow through. - ParseMultipartForm is idempotent, so downstream handlers using c.MultipartForm() / c.PostForm() still see the parsed form. config (non-blocking nit — response shape): - APIKeyPolicy.AllowedModels JSON tag: allowed-models → allowedModels to match what the dashboard sends on PUT. Input parser still accepts all three case styles (camel, kebab, snake) for backward compat. tests (non-blocking nit + new coverage): - api_keys_payload_test.go: rename "empty plain array fails" → "empty plain array clears keys" (the case asserts wantOK:true). - model_acl_test.go: 8 new tests: - CodexAliasDisallowedModelRejected / AllowedModelPasses - MultipartFormDataDisallowedModelRejected / AllowedModelPasses - MultipartWithoutModelFieldAllowed - NonJSONContentTypeAllowedThrough (10 MiB octet-stream → 200, proves the middleware no longer buffers non-JSON bodies) - JSONWithCharsetParameterStillInspected - VendorJSONContentTypeInspected (application/vnd.api+json) go test ./internal/api/... ./internal/config/... is green; the only failing test on this branch (TestCodexStaticModelsIncludeGPT55) also fails on pristine main and is unrelated.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9f805087c8
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
# Conflicts: # internal/runtime/executor/codex_executor.go
|
Addressed the remaining reviewer-facing gap around model-list UX. Restricted API keys now get policy-filtered model listings for both OpenAI/Claude-shaped /v1/models and Gemini-shaped /v1beta/models, while unrestricted keys retain the full catalog and deny-all/no-policy keys receive an empty filtered list. Also added focused coverage for:
Verification:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1d8305cef2
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Two new reviewer-flagged gaps after the prior round:
1. P1 — Content-Type-based bypass of model ACL.
Several model-executing routes (ChatCompletions, Completions, Responses,
ClaudeMessages) parse the request body as JSON regardless of
Content-Type. The previous default-allow branch in
extractRequestedModel let a restricted key send Content-Type: text/plain
with a JSON body containing a disallowed model and bypass enforcement.
Restructure the dispatch so only multipart/form-data takes the dedicated
form-parsing path; every other content type — including text/plain,
application/octet-stream, and any other non-JSON value — runs the same
peek+gjson inspection as application/json. Bodies that genuinely have
no JSON model field (true binary uploads etc.) still pass through, and
the existing oversize cap continues to limit per-request memory.
2. P2 — Duplicate-key silent-skip in PUT /api-keys structured payload.
parseAPIKeysPayload accepted [{key:'sk-x'},{key:'sk-x',allowedModels:[...]}]
and silently kept only the first row, dropping the restriction on the
duplicate. That is a quiet permission widening. Reject duplicate
structured keys with ok=false (HTTP 400 from PutAPIKeys) so callers
notice and fix the payload instead of getting an unintended fall-back
to the default policy.
Tests:
- TestModelACLMiddleware_NonJSONContentTypeWithJSONBodyStillEnforces:
Content-Type: text/plain + JSON body with disallowed model returns 403.
- TestModelACLMiddleware_OctetStreamWithoutModelAllowed: opaque
octet-stream body without a JSON model field still returns 200 from the
middleware.
- TestModelACLMiddleware_NonJSONContentTypeAllowedThrough updated to the
new contract — small opaque body returns 200; the prior 'large opaque
payload returns 200' assertion was the exact bypass this PR closes.
- TestParseAPIKeysPayload_StructuredDuplicateKeyRejected: duplicate
structured rows are rejected.
go test ./... passes.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9369170b9f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Addressed the two new bot-flagged gaps from the latest pass (commit 9369170). P1 — Content-Type bypass (codex, model_acl.go): P2 — Silent duplicate-key dedupe in PUT /api-keys (codex, config_lists.go): Tests:
Verification:
|
|
Round-4 fixes pushed at commit P1 — Content-Type bypass of model ACL (model_acl.go) P2 — Silent duplicate-key dedupe in Tests
Verification: |
Red-team review of commit 9369170 probed five edge cases the round-4 unit tests did not already cover. All pass against the current implementation; pinning them down so a future regression in either the Content-Type dispatch or the duplicate-key parser surfaces as a test failure rather than a silent ACL bypass. model_acl_test.go: EmptyContentTypeWithDisallowedJSONStillEnforces, UnparseableContentTypeStillEnforces, MixedCaseContentTypeStillDispatches. api_keys_payload_test.go: StructuredWrappedDuplicateKeyRejected, StructuredDuplicateKeyAfterTrimRejected. go test ./internal/api/... ./internal/api/handlers/management/... passes.
|
Red-team self-review of Coverage added:
Pre-existing concerns I noticed but they're not in scope for this PR — flagging for awareness:
Tests pass against the new commit; PR checks are green. |
* fix(executor): fix OAuth extra usage detection by Anthropic API Three changes to avoid Anthropic's content-based system prompt validation: 1. Fix identity prefix: Use 'You are Claude Code, Anthropic's official CLI for Claude.' instead of the SDK agent prefix, matching real Claude Code. 2. Move user system instructions to user message: Only keep billing header + identity prefix in system[] array. User system instructions are prepended to the first user message as <system-reminder> blocks. 3. Enable cch signing for OAuth tokens by default: The xxHash64 cch integrity check was previously gated behind experimentalCCHSigning config flag. Now automatically enabled when using OAuth tokens. Related: router-for-me#2599 * fix(executor): inject full Claude Code system prompt blocks with proper cache scopes Previous fix only injected billing header + agent identifier (2 blocks). Anthropic's updated detection now validates system prompt content depth: - Block count (needs 4-6 blocks, not 2) - Cache control scopes (org for agent, global for core prompt) - Presence of known Claude Code instruction sections Changes: - Add claude_system_prompt.go with extracted Claude Code v2.1.63 system prompt sections (intro, system instructions, doing tasks, tone & style, output efficiency) - Rewrite checkSystemInstructionsWithSigningMode to build 5 system blocks: [0] billing header (no cache_control) [1] agent identifier (cache_control: ephemeral, scope=org) [2] core intro prompt (cache_control: ephemeral, scope=global) [3] system instructions (no cache_control) [4] doing tasks (no cache_control) - Third-party client system instructions still moved to first user message Follow-up to 69b950d * fix: use sjson to build system blocks, avoid raw newlines in JSON The previous commit used fmt.Sprintf with %s to insert multi-line string constants into JSON strings. Go raw string literals contain actual newline bytes, which produce invalid JSON (control characters in string values). Replace with buildTextBlock() helper that uses sjson.SetBytes to properly escape text content for JSON serialization. * fix: buildTextBlock cache_control sjson path issue sjson treats 'cache_control.type' as nested path, creating {ephemeral: {scope: org}} instead of {type: ephemeral, scope: org}. Pass the whole map to sjson.SetBytes as a single value. * fix: build cache_control JSON manually to avoid sjson map marshaling * fix: remove invalid org scope and match Claude Code block layout * fix(claude): sanitize forwarded third-party prompts for OAuth cloaking Only for Claude OAuth requests, sanitize forwarded system-prompt context before it is prepended into the first user message. This preserves neutral task/tool instructions while removing OpenCode branding, docs links, environment banners, and product-specific workflow sections that still triggered Anthropic extra-usage classification after top-level system[] cloaking. * fix(claude): remove invalid cache_control scope from static system block Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> * fix(claude): reduce forwarded OAuth prompt to minimal tool reminder Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> * fix(auth, executor): normalize Qwen base URL, adjust RefreshLead duration, and add tests * fix(claude): remap OAuth tool names to Claude Code style to avoid third-party fingerprint detection A/B testing confirmed that Anthropic uses tool name fingerprinting to detect third-party clients on OAuth traffic. OpenCode-style lowercase names like 'bash', 'read', 'todowrite' trigger extra-usage billing, while Claude Code TitleCase names like 'Bash', 'Read', 'TodoWrite' pass through normally. Changes: - Add oauthToolRenameMap: maps lowercase tool names to Claude Code equivalents - Add oauthToolsToRemove: removes 'question' and 'skill' (no Claude Code counterpart) - remapOAuthToolNames: renames tools, removes blacklisted ones, updates tool_choice and messages - reverseRemapOAuthToolNames/reverseRemapOAuthToolNamesFromStreamLine: reverse map for responses - Apply in Execute(), ExecuteStream(), and CountTokens() for OAuth token requests * fix(auth): preserve and restore ready view cursors during index rebuilds * fix(handlers): add base URL validation and improve API key deletion tests * feat(antigravity): configurable signature cache with bypass-mode validation Antigravity 的 Claude thinking signature 处理新增 cache/bypass 双模式, 并为 bypass 模式实现按 SIGNATURE-CHANNEL-SPEC.md 的签名校验。 新增 antigravity-signature-cache-enabled 配置项(默认 true): - cache mode(true):使用服务端缓存的签名,行为与原有逻辑完全一致 - bypass mode(false):直接使用客户端提供的签名,经过校验和归一化 支持配置热重载,运行时可切换模式。 校验流程: 1. 剥离历史 cache-mode 的 'modelGroup#' 前缀(如 claude#Exxxx → Exxxx) 2. 首字符必须为 'E'(单层编码)或 'R'(双层编码),否则拒绝 3. R 开头:base64 解码 → 内层必须以 'E' 开头 → 继续单层校验 4. E 开头:base64 解码 → 首字节必须为 0x12(Claude protobuf 标识) 5. 所有合法签名归一化为 R 形式(双层 base64)发往 Antigravity 后端 非法签名处理策略: - 非严格模式(默认):translator 静默丢弃无签名的 thinking block - 严格模式(antigravity-signature-bypass-strict: true): executor 层在请求发往上游前直接返回 HTTP 400 按 SIGNATURE-CHANNEL-SPEC.md 解析 Claude 签名的完整 protobuf 结构: - Top-level Field 2(容器)→ Field 1(渠道块) - 渠道块提取:channel_id (Field 1)、infrastructure (Field 2)、 model_text (Field 6)、field7 (Field 7) - 计算 routing_class、infrastructure_class、schema_features - 使用 google.golang.org/protobuf/encoding/protowire 解析 - resolveThinkingSignature 拆分为 resolveCacheModeSignature / resolveBypassModeSignature - hasResolvedThinkingSignature:mode-aware 签名有效性判断 (cache: len>=50 via HasValidSignature,bypass: non-empty) - validateAntigravityRequestSignatures:executor 预检, 仅在 bypass + strict 模式下拦截非法签名返回 400 - 响应侧签名缓存逻辑与 cache mode 集成 - Cache mode 行为完全保留:无 '#' 前缀的原生签名静默丢弃 * docs(antigravity): document signature validation spec alignment Add package-level comment documenting the protobuf tree structure, base64 encoding equivalence proof, output dimensions, and spec section references. Remove unreachable legacy_vertex_group dead code. * fix(antigravity): refine 429 handling and credits fallback Includes: restore SDK docs under docs/; update antigravity executor credits tests; gofmt. * fix(claude): preserve OAuth tool renames when filtering tools Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> * fix(claude): map question/skill to TitleCase instead of removing them Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> * fix(claude): address PR review feedback for OAuth cloaking - Use buildTextBlock for billing header to avoid raw JSON string interpolation - Fix empty array edge case in prependToFirstUserMessage - Allow remapOAuthToolNames to process messages even without tools array - Move claude_system_prompt.go to helps/ per repo convention - Export prompt constants (ClaudeCode* prefix) for cross-package access Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> * fix(handlers): update listener to bind on all interfaces instead of localhost Fixed: router-for-me#2640 * feat(antigravity): prefer prod URL as first priority Promote cloudcode-pa.googleapis.com to the first position in the fallback order, with daily and sandbox URLs as fallbacks. * fix(executor): implement immediate retry with token refresh on 429 for Qwen and add associated tests Closes: router-for-me#2661 * fix(executor): handle OAuth tool name remapping with rename detection and add tests Closes: router-for-me#2656 * docs: refine CLIproxyAPI Quota Inspector description in all README locales * docs: fix CLIProxyAPI Quota Inspector naming and link casing * refactor(executor): remove immediate retry with token refresh on 429 for Qwen and update tests accordingly * docs: clarify codex quota window wording in README locales * fix(executor): handle 429 Retry-After header and default retry logic for quota exhaustion - Added proper parsing of `Retry-After` headers for 429 responses. - Set default retry duration when "disable cooling" is active on quota exhaustion. - Updated tests to verify `Retry-After` handling and default behavior. * fix(antigravity): allow 32MB bypass signatures Raise the local bypass-signature ceiling so long Claude thinking signatures are not rejected before request translation, and keep the oversized-signature test cheap to execute. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> * fix(antigravity): reduce bypass mode log noise Keep cache-disable visibility at info level while suppressing duplicate state-change logs and moving strict-mode chatter down to debug. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> * feat(auth): implement auto-refresh loop for managing auth token schedule - Introduced `authAutoRefreshLoop` to handle token refresh scheduling. - Replaced semaphore-based refresh logic in `Manager` with the new loop. - Added unit tests to verify refresh schedule logic and edge cases. * fix(antigravity): drop redacted thinking blocks with empty text Antigravity wraps empty thinking text into a prompt-caching-scope object that omits the required inner "thinking" field, causing 400 "messages.N.content.0.thinking.thinking: Field required" when Claude Max requests are routed through Antigravity in bypass mode. * fix(antigravity): skip full schema cleanup for empty tool requests Avoid whole-payload schema sanitization when translated Antigravity requests have no actual tool schemas, including missing and empty tools arrays. Add regression coverage so image-heavy no-tool requests keep bypassing the old memory amplification path. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> * refactor(auth): simplify auth directory scanning and improve JSON processing logic - Replaced `filepath.Walk` with `os.ReadDir` for cleaner directory traversal. - Fixed `isAuthJSON` check to use `filepath.Dir` for directory comparison. - Updated auth hash cache generation and file synthesis to improve readability and maintainability. * feat(auth): add configurable worker pool size for auto-refresh loop - Introduced `auth-auto-refresh-workers` config option to override default concurrency. - Updated `authAutoRefreshLoop` to support customizable worker counts. - Enhanced token refresh scheduling flexibility by aligning worker pool with runtime configurations. * fix(antigravity): strip thinking blocks with empty signatures instead of rejecting Thinking blocks with empty signatures come from proxy-generated responses (Antigravity/Gemini routed as Claude). These should be silently dropped from the request payload before forwarding, not rejected with 400. Fixes 10 "missing thinking signature" errors. * fix(antigravity): discard thinking blocks with non-Claude-format signatures Proxy-generated thinking blocks may carry hex hashes or other non-Claude signatures (e.g. "d5cb9cd0823142109f451861") from Gemini responses. These are now discarded alongside empty-signature blocks during the strip phase, before validation runs. Valid Claude signatures always start with 'E' or 'R' (after stripping any cache prefix). * fix(antigravity): use E-prefixed fake signature in strict bypass test The strict bypass test used testGeminiSignaturePayload() which produces a base64 string starting with 'C'. Since StripInvalidSignatureThinkingBlocks now strips all non-E/R signatures unconditionally, the test payload was stripped before reaching ValidateClaudeBypassSignatures, causing the test to pass the request through instead of rejecting it with 400. Replace with testFakeClaudeSignature() which produces a base64 string starting with 'E' (valid at the lightweight check) but with invalid protobuf content (no valid field 2), so strict mode correctly rejects it at the deep validation layer. * fix(antigravity): cap maxOutputTokens using registry max_completion_tokens Claude models on antigravity have a 64000 token output limit but max_tokens from downstream requests was passed through uncapped, causing 400 INVALID_ARGUMENT from Google when clients sent 128000. * chore: remove Qwen support from SDK and internal components - Deleted `QwenAuthenticator`, internal `qwen_auth`, and `qwen_executor` implementations. - Removed all Qwen-related OAuth flows, token handling, and execution logic. - Cleaned up dependencies and references to Qwen across the codebase. * chore(models): remove outdated GPT-5 and related model entries from registry JSON * feat(session-affinity): add session-sticky routing for multi-account load balancing When multiple auth credentials are configured, requests from the same session are now routed to the same credential, improving upstream prompt cache hit rates and maintaining context continuity. Core components: - SessionAffinitySelector: wraps RoundRobin/FillFirst selectors with session-to-auth binding; automatic failover when bound auth is unavailable, re-binding via the fallback selector for even distribution - SessionCache: TTL-based in-memory cache with background cleanup goroutine, supporting per-session and per-auth invalidation - StoppableSelector interface: lifecycle hook for selectors holding resources, called during Manager.StopAutoRefresh() Session ID extraction priority (extractSessionIDs): 1. metadata.user_id with Claude Code session format (old user_{hash}_session_{uuid} and new JSON {session_id} format) 2. X-Session-ID header (generic client support) 3. metadata.user_id (non-Claude format, used as-is) 4. conversation_id field 5. Stable FNV hash from system prompt + first user/assistant messages (fallback for clients with no explicit session ID); returns both a full hash (primaryID) and a short hash without assistant content (fallbackID) to inherit bindings from the first turn Multi-format message hash covers OpenAI messages, Claude system array, Gemini contents/systemInstruction, and OpenAI Responses API input items (including inline messages with role but no type field). Configuration (config.yaml routing section): - session-affinity: bool (default false) - session-affinity-ttl: duration string (default "1h") - claude-code-session-affinity: bool (deprecated, alias for above) All three fields trigger selector rebuild on config hot reload. Side effect: Idempotency-Key header is no longer auto-generated with a random UUID when absent — only forwarded when explicitly provided by the client, to avoid polluting session hash extraction. * fix(antigravity): strip billing header from system instruction before upstream call The x-anthropic-billing-header block in the Claude system array is client-internal metadata and should not be forwarded to the Gemini upstream as part of systemInstruction.parts. * fix(docker-build): improve argument handling and error messaging for usage option * fix(util): forward custom Host header to upstream Custom headers configured under openai-compatibility (and any other provider passing through applyCustomHeaders) were silently dropped for the Host key, because Go's net/http reads the wire Host from req.Host, not req.Header["Host"]. As a result, virtual-host routed upstreams (e.g. LiteLLM behind an ingress) saw the base-url's host instead of the user-configured override and returned 404. Detect the Host key with http.CanonicalHeaderKey and assign it to req.Host so it is actually written on the wire. Other headers continue to use Header.Set as before. Fixes router-for-me#2833 * fix(handlers): include execution session metadata and skip idempotency key when absent - Refactored `requestExecutionMetadata` to handle empty `Idempotency-Key` gracefully. - Added test to validate metadata inclusion of execution session without idempotency key. * feat(auth): add proxy URL override support to auth constructors and executors - Introduced `WithProxyURL` variants for `CodexAuth`, `ClaudeAuth`, `IFlowAuth`, and `DeviceFlowClient`. - Updated executors to use proxy-aware constructors for improved configurability. - Added unit tests to validate proxy override precedence and functionality. Closes: router-for-me#2823 * chore: remove iFlow-related modules and dependencies - Deleted `iflow` provider implementation, including thinking configuration (`apply.go`) and authentication modules. - Removed iFlow-specific tests, executors, and helpers across SDK and internal components. - Updated all references to exclude iFlow functionality. * feat(models): add Claude Opus 4.7 model entry to registry JSON * fix(tests): update model lookup references and enhance Claude executor tests * fix(tests): update Gemini family test case numbers for consistency * fix(util): also keep Host in header map for synthetic requests Addressing the P1 note from the Codex reviewer: applyCustomHeaders is also called with a synthetic &http.Request{Header: ...} from the websockets executors (aistudio_executor.go, codex_websockets_executor.go), which forward only the header map. The previous continue meant a custom Host was dropped from that map, regressing virtual-host overrides on those flows. Mirror the value to both r.Host (for real net/http) and r.Header (for header-map-only consumers). * feat(api): integrate auth index into key retrieval endpoints for Gemini, Claude, Codex, OpenAI, and Vertex * fix(management): stabilize auth-index mapping * fix(tests): remove obsolete config_auth_index_test file * feat(translator): add partial and full image generation support in Codex-GPT and Codex-Gemini flows - Introduced `LastImageHashByItemID` in Codex-GPT and `LastImageHashByID` in Codex-Gemini for deduplication of generated images. - Added support for handling `partial_image` and `image_generation_call` types, with inline data embedding for Gemini and URL payload conversion for GPT. - Extended unit tests to verify image handling in both streaming and non-streaming modes. * fix(executor): drop obsolete context-1m-2025-08-07 beta header (fixes router-for-me#2866) Anthropic has moved the 1M-context-window feature to General Availability, so the context-1m-2025-08-07 beta flag is no longer accepted and now causes 400 Bad Request errors when forwarded upstream. Remove the X-CPA-CLAUDE-1M detection and the corresponding injection of the now-invalid beta header. Also drop the unused net/textproto import that was only needed for the header-key lookup. * feat(docs): add VisionCoder sponsorship details and optimize external links - Added VisionCoder sponsorship information to `README.md`, `README_CN.md`, and `README_JA.md`. - Updated external links to include `target="_blank"` for improved user experience. - Added new logo asset `visioncoder.png` for README use. * feat(access): per-api-key model allowlist with glob matching Adds enforcement for per-client-key model ACLs at the request middleware layer. The dashboard already stores allowedModels per UserApiKey and attempts to push them to PUT /v0/management/api-keys, but upstream until now had nowhere to land them — the structured payload was rejected and the dashboard's fallback silently dropped the policy data, so no key was ever actually restricted. config: - Add APIKeyPolicy{Key, AllowedModels} and SDKConfig.APIKeyPolicies - Add SDKConfig.APIKeyDefaultPolicy ("allow-all" | "deny-all"; defaults to allow-all so existing deployments are unaffected) - IsModelAllowedForKey supports path.Match globs and tolerates a single leading "<prefix>/" segment so policies stay portable across prefixed credentials management API: - PutAPIKeys now accepts both the legacy string list and the structured [{key, allowedModels}] form, with allowedModels / allowed-models / allowed_models all recognized so existing dashboard payloads work - GetAPIKeys exposes api-key-policies when any are configured middleware: - New ModelACLMiddleware extracts the targeted model from the JSON body ("model" field) or the Gemini-style /v1beta/models/<model>:<action> path, then enforces the per-key policy. The body is read once and restored via io.NopCloser so downstream handlers see the original request bytes - Wired into both /v1 and /v1beta groups directly after AuthMiddleware - Bypasses any request that has no identified api key (preserving the legacy "no auth provider configured => allow" behavior of AuthMiddleware) and any request shape with no extractable model (listings, websocket upgrades) so discovery endpoints keep working tests: - Unit tests for IsModelAllowedForKey covering allow-all, deny-all, empty allowed list semantics, exact matches, glob matches, and the prefixed-credential case - Middleware tests covering the allow / deny / unknown-key / Gemini path / list-endpoint / body-pass-through paths - Parser tests for the management PutAPIKeys payload covering the legacy plain list, the wrapped items form, and the structured form in all three case styles * fix(access): address review feedback on per-key model ACL Three changes landed after PR router-for-me#2911 review: middleware (DoS hardening): - Cap request body inspected by ModelACLMiddleware at 10 MiB via io.LimitReader; requests above the cap are rejected with HTTP 413 ("request_too_large") instead of being buffered into memory - Short-circuit on Content-Length when the client declares an oversized body, so we bail before reading any bytes - Any non-size read error now fails closed with HTTP 400 rather than silently allowing the request past the policy check - New tests cover both the streamed-oversize and Content-Length paths config (O(1) policy lookup + cache invalidation): - SDKConfig.APIKeyPolicies is now backed by an atomic.Pointer cache (map[string]*APIKeyPolicy) so IsModelAllowedForKey is O(1) per call instead of O(N) linear scan; the cache is lazily rebuilt via CompareAndSwap and safe under concurrent reads - Add SetAPIKeyPolicies (atomic replace + invalidate) and InvalidatePolicyIndex (manual flush after in-place edits) - Tests exercise mutation visibility, nil-safe receivers, concurrent access, and defensive-copy semantics management handlers (policy rotation/deletion): - PatchAPIKeys gains a dedicated handler that keeps APIKeyPolicies in sync on rename (via old/new) and rotation (via index/value); when the target key already has a policy the source row is dropped so there are no duplicate entries - DeleteAPIKeys removes the matching APIKeyPolicy rows so a revoked credential cannot leave its allowlist behind; slice is nilled when empty - PutAPIKeys routes through SetAPIKeyPolicies so the cache is refreshed atomically with the slice swap - Tests cover rename-with-existing-target, append-new, delete-by-value, delete-by-index, last-row-clears, and PUT-rebuilds-policies No behavior change for existing deployments that do not configure api-key-policies. All existing tests still pass. * fix(access): address gemini-code-assist review round 2 - config_lists.go: PutAPIKeys, PatchAPIKeys, and DeleteAPIKeys now acquire h.mu before mutating h.cfg and use persistLocked so the mutation and the subsequent config write happen atomically from the perspective of readers (middleware goroutines). Previously the mutations raced with ModelACLMiddleware evaluating the same fields on concurrent requests. Matches the existing pattern in PutGeminiKeys / PatchGeminiKey. (HIGH review comment) - config_lists.go: renameAPIKeyPolicy no longer reslices the existing APIKeyPolicies backing array via "[:0]". That was unsafe because SDKConfig.policyIndex caches *APIKeyPolicy into the same backing array; a concurrent reader resolving the cache mid-write would observe a partially-mutated entry. Allocate a fresh slice so any still-cached pointers keep pointing at the pre-mutation entries until InvalidatePolicyIndex forces a refresh. (MEDIUM review comment) - model_acl.go: extractRequestedModel now peeks the first 16 KiB of the request body before committing to buffering the full cap. In realistic chat-completion payloads "model" is always near the top of the JSON, so the peek alone is usually sufficient and the middleware allocates ~16 KiB per request instead of up to modelACLMaxBodyBytes (10 MiB). When the peek does not contain "model", the middleware falls back to the prior bounded read and still enforces the cap. Under concurrency this is the difference between a bounded constant footprint and N*10 MiB. (MEDIUM review comment) - model_acl_test.go: two new tests lock in peek-first behavior — one covers a large body with "model" in the peek (must extract, enforce, and preserve the full downstream body byte-for-byte), the other covers "model" past the peek but within the cap (must still extract via the fallback read). * fix(access): preserve full Gemini model path in ACL extraction The previous extractor for /v1beta/models/<model>:<action> paths truncated at the first '/' inside <model>. That broke deployments using force-model-prefix, where the routed model identifier is literally "<prefix>/<model>" (e.g. "teamA/gemini-3-pro"). The GeminiHandler forwards the whole segment-before-":" as the model, and IsModelAllowedForKey already strips a single leading "<prefix>/" before glob matching, so the ACL extractor must mirror that and forward the full identifier including embedded slashes. Otherwise: - allowed requests were rejected (the ACL saw "teamA" instead of "teamA/gemini-3-pro") - policies scoped to just the prefix segment could unintentionally allow unrelated models sharing the prefix Added TestModelACLMiddleware_GeminiPathPreservesPrefix which uses the same AllowedModels pattern against both an allowed and a disallowed model under the same prefix — a regression would surface as a false-positive on either the allow path or the deny path. * fix(access): address codex review round 3 — WS bypass + drain DoS Two P1s from the Codex reviewer on the prior commit: 1. Websocket ACL bypass /v1/responses accepts a websocket upgrade, and the model identifier is only selected later in frames consumed by the ResponsesWebsocket handler. extractRequestedModel returns found=false for the upgrade request, and the middleware was falling through as "no model here, allow", letting a restricted api key use any model through the websocket path. Fix: after identifying the key but before attempting body extraction, detect Upgrade: websocket and reject the upgrade when keyHasModelRestriction(cfg, apiKey) is true. A key is considered restricted when either it has an APIKeyPolicy entry with a non-empty AllowedModels list, or the deployment default is deny-all and the key has no matching policy. Unrestricted keys still pass through so the legacy websocket flow keeps working for them. 2. Unbounded drain after 413 After detecting the body exceeded modelACLMaxBodyBytes, the code called io.Copy(io.Discard, c.Request.Body), which for a chunked/streamed request without a trustworthy Content-Length can hold the handler goroutine indefinitely — an attacker just keeps streaming bytes and the ACL check turns into a request-slot exhaustion path. Fix: drop the drain entirely. Return 413 and let net/http close the connection. No keep-alive concern since we're rejecting the request. Tests added: - WebsocketUpgradeRejectedForRestrictedKey: key with AllowedModels attempting WS upgrade gets 403, handler never runs. - WebsocketUpgradeAllowedForUnrestrictedKey: key with no policy under allow-all default still gets through. - WebsocketUpgradeRejectedUnderDenyAllDefault: deny-all default rejects WS upgrades for keys with no matching policy. - OversizedBodyDoesNotDrainRemainder: uses a reader that blocks after its prefix to prove the middleware returns 413 promptly (within 2s) instead of reading to EOF. * fix(access): keep policies consistent across duplicate key values Two P2s from the Codex reviewer, both rooted in the same legacy edge case: APIKeys can hold duplicate values, and the policy helpers (renameAPIKeyPolicy, removeAPIKeyPolicy) are global — they rewrite or drop the policy row by value, not by index. That opened two ways for a duplicate to survive a patch/delete without its intended restriction: 1. PatchAPIKeys old/new path The loop broke at the first match. If APIKeys was ["sk-dup", "sk-other", "sk-dup"] and the caller did {"old":"sk-dup","new":"sk-new"}, one slot remained "sk-dup" while the policy migrated to "sk-new", so the leftover "sk-dup" fell back to the default policy (usually allow-all). Fix: remove the break — rename every duplicate so nothing is left behind that the global policy migration would orphan. 2. PatchAPIKeys index/value path Similar shape: rewriting one slot while the old value still appears elsewhere, then moving the policy to the new value, left the surviving duplicate unrestricted. Fix: only migrate the policy when the old value no longer appears anywhere in APIKeys. When duplicates remain, the policy stays attached to the old value; the new value falls back to the default until the caller sets an explicit policy via PUT /api-keys. 3. DeleteAPIKeys indexed path Same pattern — removed one slot, then dropped the policy row unconditionally, which orphaned surviving duplicates. Fix: only drop the policy when no other entries still carry the removed value. Added apiKeyValuePresent() helper so the three call sites share one membership check. Tests added: - PatchAPIKeys_RenameViaOldNewRenamesAllDuplicates: three-slot fixture with two duplicates proves all matching slots are renamed and the migrated policy still applies. - PatchAPIKeys_IndexedReplaceLeavesPolicyWhenDuplicateRemains: two-duplicate fixture proves the surviving slot keeps its policy (asserts both positively — still allows the permitted model — and negatively — still rejects models outside the allowlist). - DeleteAPIKeys_ByIndexKeepsPolicyWhenDuplicateRemains: same shape for the delete path. * feat(auth): add refresh backoff for ineffective token updates - Introduced `refreshIneffectiveBackoff` to prevent tight-looping in auto-refresh when token refresh fails to update expiry. - Adjusted refresh logic to apply backoff when `shouldRefresh` evaluates true. Closes: router-for-me#2830 * fix(executor): recover gpt-5.x text content for non-streaming requests Codex/ChatGPT reasoning-model upstreams (the gpt-5.x family) emit assistant text only via response.output_text.delta events; the final response.completed event has no output item with content[].type == "output_text". The codex executor's non-streaming path filtered out every line except response.completed and passed it straight to the chat-completions translator, which then surfaced a message with content == null while completion_tokens was non-zero. Clients that read .choices[0].message.content saw the field as null and rendered fallback text such as "Something went wrong." even though the model had generated the requested response. Fix: walk the SSE body once, accumulate text from response.output_text.delta events, and synthesize a message item containing the accumulated text into response.output[] before passing the event to the translator. The accumulator is a no-op for non-reasoning models that already populate the message item themselves (codexResponseHasOutputText guards against double-injection). Tests cover codexResponseHasOutputText across the empty/missing/populated shapes, injectCodexOutputText for fresh and pre-populated output arrays, and a representative end-to-end gpt-5.x SSE body to verify the loop accumulates the deltas and produces the expected post-injection event. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(codex): backfill streaming response output * perf(codex): avoid repeated output patch writes * feat(api): add support for `HEAD` requests to `/healthz` endpoint - Refactored `/healthz` handler to support `HEAD` requests alongside `GET`. - Updated tests to include validation for `HEAD` requests with expected status and empty body. Closes: router-for-me#2929 * feat(models): add Kimi K2.6 model entry to registry JSON * feat(models): add hardcoded GPT-Image-2 model support in Codex - Added `GPT-Image-2` as a built-in model to avoid dependency on remote updates for Codex. - Updated model tier functions (`CodexFree`, `CodexTeam`, etc.) to include built-in models via `WithCodexBuiltins`. - Introduced new handlers for image generation and edit operations under `OpenAIAPIHandler`. - Extended tests to validate 503 response for unsupported image model requests. * fix(handlers): remove handling of unsupported `n` parameter in OpenAI image handlers * fix(handlers): remove references to unsupported `n` parameter in OpenAI image handlers * feat(codex): enable image generation for all Codex upstream requests Codex CLI gates the built-in image_generation tool behind AuthMode::Chatgpt (OAuth only). When clients connect via API key auth through CPA, the tool is absent from requests, making image generation unavailable through the reverse proxy. Changes: 1. Inject image_generation tool (codex_executor.go): Add ensureImageGenerationTool() that appends {"type":"image_generation","output_format":"png"} to the tools array if not already present. Applied to all three execution paths: Execute, executeCompact, and ExecuteStream. 2. Route aliases for Codex CLI direct access (server.go): Add /backend-api/codex/responses routes that map to the same OpenAI Responses API handlers as /v1/responses. This allows Codex CLI to connect via chatgpt_base_url config while keeping AuthMode::Chatgpt, which enables the built-in image_generation tool on the client side. 3. Unit tests (codex_executor_imagegen_test.go): Cover no-tools, existing tools, already-present, empty array, and mixed built-in tool scenarios. * feat(antigravity): conductor-level credits fallback for Claude models Move credits handling from executor-level retry to conductor-level orchestration. When all free-tier auths are exhausted (429/503), the conductor discovers auths with available Google One AI credits and retries with enabledCreditTypes injected via context flag. Key changes: - Add AntigravityCreditsHint system for tracking per-auth credits state - Conductor tries credits fallback after all auths fail (Execute/Stream/Count) - Executor injects enabledCreditTypes only when conductor sets context flag - Credits fallback respects provider scope (requires antigravity in providers) - Add context cancellation check in credits fallback to avoid wasted requests - Remove executor-level attemptCreditsFallback and preferCredits machinery - Restructure 429 decision logic (parse details first, keyword fallback) - Expand shouldAbort to cover INVALID_ARGUMENT/FAILED_PRECONDITION/500+UNKNOWN - Support human-readable retry delay parsing (e.g. "1h43m56s") * feat: support extracting X-Amp-Thread-Id header as session id for session affinity * fix(antigravity): remove credits fallback from CountTokens, fix gofmt CountTokens upstream API does not support enabledCreditTypes, so remove the dead credits fallback path from ExecuteCount and delete the unused tryAntigravityCreditsExecuteCount method. Fix gofmt on credits test file. * fix: forward HTTP headers to executor Options so session affinity can read X-Amp-Thread-Id * fix(antigravity): respect pinned auth in credits fallback, release deferred body on success - findAllAntigravityCreditsCandidateAuths now filters by PinnedAuthMetadataKey to prevent credential isolation violations during credits fallback - Release deferredBody reference on success path to avoid holding large payloads in memory for the lifetime of the gin context * refactor(logging): strip unrelated deferred body changes, keep credits-only logging Remove deferred body optimization and maxErrorLog constants that were unrelated to credits fallback. Keep only MarkCreditsUsed/CreditsUsed helpers for flagging requests that consumed AI credits. * fix(auth): break credits cold-start deadlock by keeping unknown-hint auths as fallback candidates Replace antigravityCreditsAvailableForModel with inline known/unknown split. Auths whose credit hints are not yet populated are kept as lower-priority candidates instead of being rejected, breaking the chicken-and-egg deadlock at cold start. * perf(antigravity): async credits hint refresh for warm tokens * feat(logging): add AI API path support for image routes - Included `/v1/images` in AI API path prefixes. - Introduced tests to validate `/v1/images/generations` and `/v1/images/edits` as AI API paths. * feat(models): add GPT-5.5 model entry to registry JSON * Add GPT-5.5 Codex model support * chore(models): remove GPT-5.5 model entry from registry JSON * feat(codex): pass base model to enable conditional image_generation tool injection - Modified `ensureImageGenerationTool` to accept `baseModel` for conditional logic. - Ensured `gpt-5.3-codex-spark` models bypass image_generation tool injection. - Updated relevant tests and executor logic to reflect changes. * fix antigravity credits stream fallback * fix(codex): classify known upstream failures Normalize Codex context, thinking-signature, previous-response, and auth failures to explicit error codes: context_too_large, thinking_signature_invalid, previous_response_not_found, auth_unavailable. Refs router-for-me#2596. * feat(auth): disallow free-tier Codex auth during selection process - Introduced `disallowFreeAuthFromMetadata` and `isFreeCodexAuth` to enforce skipping free-tier credentials. - Modified scheduler logic to honor `DisallowFreeAuthMetadataKey` during auth selection. - Updated `ensureImageGenerationTool` to skip tool injection for free-tier Codex auth. - Added context utility `WithDisallowFreeAuth` and integrated with image handlers. - Augmented relevant tests to cover free-tier exclusion scenarios. * Add CPA Usage Keeper to README ecosystem list * docs:Add CPA Usage Keeper to README ecosystem list * feat(api): implement protocol multiplexer and Redis queue for usage integration - Added `protocol_multiplexer.go`, enabling support for both HTTP and Redis protocols on a single listener. - Introduced `redis_queue_protocol.go` to handle Redis-compatible RESP commands for queue management. - Integrated `redisqueue` package, supporting in-memory queuing with expiration pruning. - Updated server initialization to manage a shared listener and multiplex connections. - Adjusted `Handler` to adopt `AuthenticateManagementKey` for modular key validation, supporting both HTTP and Redis flows. * feat(security): implement IP ban for repeated management key and Redis AUTH failures - Added IP ban logic to `AuthenticateManagementKey` and Redis protocol handlers, blocking requests after multiple failed attempts. - Introduced unit tests to validate IP ban behavior across localhost and remote clients. - Synchronized Redis protocol's authentication policy with management key validation. * feat(models): add Codex Auto Review model entry to registry JSON Closes: router-for-me#2995 * feat(api): enhance model assignment logic in image handlers - Updated `buildImagesResponsesRequest` to derive `model` dynamically based on `toolJSON`. - Adjusted streaming execution to handle dynamic model resolution across multiple contexts. Closes: router-for-me#2965 * fix(test): remove free tier from GPT-5.5 inclusion test GPT-5.5 was correctly removed from codex-free tier in 7b89583 (since free accounts cannot access it), but the test was not updated to reflect this. This caused TestCodexStaticModelsIncludeGPT55 to fail on the free subtest. Changes: - Remove free tier from GPT-5.5 inclusion test - Add new TestCodexFreeModelsExcludeGPT55 to explicitly verify that free tier does NOT include GPT-5.5 * feat(config): add support for disabling OpenAI compatibility providers - Introduced a `Disabled` flag to OpenAI compatibility configurations. - Updated routing, auth selection, and API handling logic to respect the `Disabled` state. - Extended relevant APIs, YAML configurations, and data structures to include the `Disabled` field. - Adjusted all relevant loops and filters to skip disabled providers. Closes: router-for-me#3060 router-for-me#3059 router-for-me#2977 * feat(executor): add support for Codex image generation tool usage tracking - Introduced `publishCodexImageToolUsage` to report image generation tool metrics. - Updated executor logic to handle image generation tool events and defaults. - Added parsing logic for `image_gen` tool usage details in `helps/usage_helpers.go`. - Updated `UsageReporter` for additional model-specific usage publishing. - Refactored usage detail normalizations. Closes: router-for-me#3063 * fix(usage_helpers): skip zero-token usage in additional model records - Added `buildAdditionalModelRecord` to filter out zero-token usage details. - Introduced `hasNonZeroTokenUsage` helper function for token usage validation. - Updated tests to cover scenarios for zero and non-zero token usage. * feat(codex): handle thinking-signature conversion for reasoning content - Implemented `appendReasoningContent` to support processing of `thinking` signature and text as reasoning input. - Added test cases to validate reasoning content conversion with and without text. * fix(codex): include `content` field in reasoning item initialization * Preserve Codex reasoning signatures for Claude * fix(antigravity): use real antigravity UA when polling credits balance The loadCodeAssist polling call hardcoded the User-Agent to google-api-nodejs-client/9.15.1. Google Cloud Code returns the paidTier object WITHOUT the availableCredits array for that UA, so updateAntigravityCreditsBalance always saw "no credits", set the hint to Available=false for every Google One AI Ultra account, and the conductor-level credits fallback could never find a candidate. Switching to resolveUserAgent(auth) (the same UA used for streamGenerateContent / generateContent) makes the response include availableCredits, so the credits hint is populated correctly and the fallback can actually inject enabledCreditTypes:["GOOGLE_ONE_AI"] when free tier is exhausted. * test(api): add validation for unsupported models in OpenAI image handlers - Introduced tests to ensure unsupported models are rejected in `/images/generations` and `/images/edits`. - Added `isSupportedImagesModel` and `rejectUnsupportedImagesModel` functions for consistent model validation. - Enhanced image handler logic to apply validation checks for model compatibility. * fix antigravity user agent handling * fix antigravity client agent headers * fix(access): address luispater review — alias bypass + multipart gating Three blockers + two nits from luispater's CHANGES_REQUESTED review on PR router-for-me#2914: server.go (BLOCKING — alias bypass): - /backend-api/codex/* uses AuthMiddleware but had no ModelACLMiddleware, so a key restricted by api-key-policies could bypass the allowlist by routing through the codex alias instead of /v1/responses. Added ModelACLMiddleware to the codexDirect group. model_acl.go (BLOCKING — non-JSON body buffering): - extractRequestedModel previously buffered any POST/PUT/PATCH body up to 10 MiB to look for a JSON "model" field. With ACLs enabled this also buffered non-JSON uploads (image edits, future binary endpoints) even when no JSON model field could exist. - Dispatch on Content-Type: - application/json (and *+json, missing CT) → existing peek+buffer path with the 10 MiB cap. - multipart/form-data → ParseMultipartForm with a 1 MiB in-memory cap; large file parts spool to temp disk per stdlib. This recovers enforcement on /v1/images/edits without ballooning resident memory. - everything else → no body inspection, allow through. - ParseMultipartForm is idempotent, so downstream handlers using c.MultipartForm() / c.PostForm() still see the parsed form. config (non-blocking nit — response shape): - APIKeyPolicy.AllowedModels JSON tag: allowed-models → allowedModels to match what the dashboard sends on PUT. Input parser still accepts all three case styles (camel, kebab, snake) for backward compat. tests (non-blocking nit + new coverage): - api_keys_payload_test.go: rename "empty plain array fails" → "empty plain array clears keys" (the case asserts wantOK:true). - model_acl_test.go: 8 new tests: - CodexAliasDisallowedModelRejected / AllowedModelPasses - MultipartFormDataDisallowedModelRejected / AllowedModelPasses - MultipartWithoutModelFieldAllowed - NonJSONContentTypeAllowedThrough (10 MiB octet-stream → 200, proves the middleware no longer buffers non-JSON bodies) - JSONWithCharsetParameterStillInspected - VendorJSONContentTypeInspected (application/vnd.api+json) go test ./internal/api/... ./internal/config/... is green; the only failing test on this branch (TestCodexStaticModelsIncludeGPT55) also fails on pristine main and is unrelated. * fix(access): filter model listings by API key policy * Fix TUI hourly usage fallback --------- Co-authored-by: wykk-12138 <z710959836@gmail.com> Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> Co-authored-by: Luis Pater <webmaster@idotorg.org> Co-authored-by: sususu98 <suchangshan@foxmail.com> Co-authored-by: ZTXBOSS666 <150424052+ZTXBOSS666@users.noreply.github.com> Co-authored-by: Chén Mù <10558748+hkfires@users.noreply.github.com> Co-authored-by: Allen Yi <yiwenyou_allen@outlook.com> Co-authored-by: sususu98 <33882693+sususu98@users.noreply.github.com> Co-authored-by: muzhi1991 <2101044+muzhi1991@users.noreply.github.com> Co-authored-by: Supra4E8C <tan13318527080@163.com> Co-authored-by: Supra4E8C <69194597+LTbinglingfeng@users.noreply.github.com> Co-authored-by: octo-patch <octo-patch@users.noreply.github.com> Co-authored-by: Lyra <lyra@tuckerbradford.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: stringer07 <1742292793@qq.com> Co-authored-by: MoYeRanQianZhi <MoYeRanQianZhi@gmail.com> Co-authored-by: XYenon <i@xyenon.bid> Co-authored-by: Ben Vargas <ben@vargas.com> Co-authored-by: Matthias319 <Matthias319@users.noreply.github.com> Co-authored-by: Will <wangxking@outlook.com> Co-authored-by: philipbankier <philipbankier@gmail.com> Co-authored-by: xbang <xbang0222@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e0e8d1f90b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
ready for another review! |
|
@luispater anything else you need from me on this ? |
|
Quick status update: I re-checked the latest inline review state and found three newer codex comments after my prior red-team note that I hadn't accounted for yet:
I'm treating those as the remaining blockers and will push another fix/test pass before asking for re-review again. |
Summary
Re-submission of #2911 against `dev` (auto-retarget CI closed the original).
Includes the three fixes requested in the review.
The dashboard already stores `allowedModels` per UserApiKey and tries to push them to `PUT /v0/management/api-keys`, but upstream had nowhere to land them — so no key was ever actually restricted. This PR wires enforcement at the request-middleware layer.
Feature (original commit)
Review fixes (second commit)
Test plan
Replaces #2911.