Skip to content

feat(access): per-api-key model allowlist with glob matching (v2, review fixes)#2914

Open
mtuckerb wants to merge 13 commits into
router-for-me:devfrom
mtuckerb:feat/api-key-allowed-models-v2
Open

feat(access): per-api-key model allowlist with glob matching (v2, review fixes)#2914
mtuckerb wants to merge 13 commits into
router-for-me:devfrom
mtuckerb:feat/api-key-allowed-models-v2

Conversation

@mtuckerb
Copy link
Copy Markdown

Summary

Re-submission of #2911 against `dev` (auto-retarget CI closed the original).
Includes the three fixes requested in the review.

The dashboard already stores `allowedModels` per UserApiKey and tries to push them to `PUT /v0/management/api-keys`, but upstream had nowhere to land them — so no key was ever actually restricted. This PR wires enforcement at the request-middleware layer.

Feature (original commit)

  • `APIKeyPolicy{Key, AllowedModels}` on `SDKConfig`; `APIKeyDefaultPolicy` (`allow-all` default, `deny-all` opt-in) so existing deployments are unaffected.
  • `IsModelAllowedForKey` uses `path.Match` globs and tolerates a single leading `/` segment so policies stay portable across prefixed credentials.
  • `PutAPIKeys` accepts both legacy string list and structured `[{key, allowedModels}]`; `GetAPIKeys` surfaces `api-key-policies`.
  • `ModelACLMiddleware` extracts the target model from the JSON body or the Gemini-style `/v1beta/models/:` path and enforces the per-key policy; wired into both `/v1` and `/v1beta` groups after `AuthMiddleware`.

Review fixes (second commit)

  1. DoS hardening in ModelACLMiddleware. Body read is now capped at 10 MiB via `io.LimitReader` (and short-circuited on `Content-Length`); oversize requests get HTTP 413 instead of silently buffering. Non-size read errors fail closed with 400 rather than bypassing the policy check.
  2. O(1) policy lookup. `APIKeyPolicies` is backed by an `atomic.Pointer[map[string]*APIKeyPolicy]` cache, lazily built via CompareAndSwap, safe under concurrent reads. Adds `SetAPIKeyPolicies` (atomic replace + invalidate) and `InvalidatePolicyIndex`.
  3. Policy rotation/deletion sync. `PatchAPIKeys` keeps `APIKeyPolicies` in sync on rename (via `old`/`new`) and rotation (via `index`/`value`); `DeleteAPIKeys` drops matching policy rows so a revoked credential cannot leave its allowlist behind; `PutAPIKeys` routes through `SetAPIKeyPolicies` for atomic cache refresh.

Test plan

  • `go build ./...` clean
  • `go test ./...` green on `dev` + this branch
  • New middleware tests: `OversizedBodyRejectedWith413`, `OversizedBodyRejectedViaContentLength`
  • New cache tests: setter-rebuilds, in-place-edit+invalidate, nil-safe, concurrent reads, defensive-copy, empty-clears
  • New handler tests: rename-via-old-new, rename-via-index-value, rename-with-existing-target, append-new, delete-by-value, delete-by-index, last-row-clears, PUT-rebuilds-policies

Replaces #2911.

mtuckerb and others added 2 commits April 19, 2026 13:54
Adds enforcement for per-client-key model ACLs at the request middleware
layer. The dashboard already stores allowedModels per UserApiKey and
attempts to push them to PUT /v0/management/api-keys, but upstream until
now had nowhere to land them — the structured payload was rejected and
the dashboard's fallback silently dropped the policy data, so no key was
ever actually restricted.

config:
- Add APIKeyPolicy{Key, AllowedModels} and SDKConfig.APIKeyPolicies
- Add SDKConfig.APIKeyDefaultPolicy ("allow-all" | "deny-all"; defaults
  to allow-all so existing deployments are unaffected)
- IsModelAllowedForKey supports path.Match globs and tolerates a single
  leading "<prefix>/" segment so policies stay portable across prefixed
  credentials

management API:
- PutAPIKeys now accepts both the legacy string list and the structured
  [{key, allowedModels}] form, with allowedModels / allowed-models /
  allowed_models all recognized so existing dashboard payloads work
- GetAPIKeys exposes api-key-policies when any are configured

middleware:
- New ModelACLMiddleware extracts the targeted model from the JSON body
  ("model" field) or the Gemini-style /v1beta/models/<model>:<action>
  path, then enforces the per-key policy. The body is read once and
  restored via io.NopCloser so downstream handlers see the original
  request bytes
- Wired into both /v1 and /v1beta groups directly after AuthMiddleware
- Bypasses any request that has no identified api key (preserving the
  legacy "no auth provider configured => allow" behavior of
  AuthMiddleware) and any request shape with no extractable model
  (listings, websocket upgrades) so discovery endpoints keep working

tests:
- Unit tests for IsModelAllowedForKey covering allow-all, deny-all,
  empty allowed list semantics, exact matches, glob matches, and the
  prefixed-credential case
- Middleware tests covering the allow / deny / unknown-key / Gemini
  path / list-endpoint / body-pass-through paths
- Parser tests for the management PutAPIKeys payload covering the
  legacy plain list, the wrapped items form, and the structured form
  in all three case styles
Three changes landed after PR router-for-me#2911 review:

middleware (DoS hardening):
- Cap request body inspected by ModelACLMiddleware at 10 MiB via
  io.LimitReader; requests above the cap are rejected with HTTP 413
  ("request_too_large") instead of being buffered into memory
- Short-circuit on Content-Length when the client declares an oversized
  body, so we bail before reading any bytes
- Any non-size read error now fails closed with HTTP 400 rather than
  silently allowing the request past the policy check
- New tests cover both the streamed-oversize and Content-Length paths

config (O(1) policy lookup + cache invalidation):
- SDKConfig.APIKeyPolicies is now backed by an atomic.Pointer cache
  (map[string]*APIKeyPolicy) so IsModelAllowedForKey is O(1) per call
  instead of O(N) linear scan; the cache is lazily rebuilt via
  CompareAndSwap and safe under concurrent reads
- Add SetAPIKeyPolicies (atomic replace + invalidate) and
  InvalidatePolicyIndex (manual flush after in-place edits)
- Tests exercise mutation visibility, nil-safe receivers, concurrent
  access, and defensive-copy semantics

management handlers (policy rotation/deletion):
- PatchAPIKeys gains a dedicated handler that keeps APIKeyPolicies in
  sync on rename (via old/new) and rotation (via index/value); when the
  target key already has a policy the source row is dropped so there
  are no duplicate entries
- DeleteAPIKeys removes the matching APIKeyPolicy rows so a revoked
  credential cannot leave its allowlist behind; slice is nilled when
  empty
- PutAPIKeys routes through SetAPIKeyPolicies so the cache is refreshed
  atomically with the slice swap
- Tests cover rename-with-existing-target, append-new, delete-by-value,
  delete-by-index, last-row-clears, and PUT-rebuilds-policies

No behavior change for existing deployments that do not configure
api-key-policies. All existing tests still pass.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces per-API-key model access control (ACL) enforcement. It adds a new ModelACLMiddleware that validates requested models against configured policies for both OpenAI and Gemini-style endpoints. The configuration system is updated to support structured API key policies with glob-based model matching and a lazy-loaded, atomic cache for efficient lookups. Management handlers for API keys are enhanced to keep these policies in sync during creation, rotation, and deletion. Feedback highlights potential data races in the management handlers due to missing synchronization, unsafe slice reuse that could lead to stale cache reads, and memory pressure concerns regarding the buffering of large request bodies in the ACL middleware.

Comment thread internal/api/handlers/management/config_lists.go
Comment thread internal/api/handlers/management/config_lists.go Outdated
Comment thread internal/api/model_acl.go Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b792cc2c49

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/api/model_acl.go
Comment thread internal/api/model_acl.go Outdated
- config_lists.go: PutAPIKeys, PatchAPIKeys, and DeleteAPIKeys now
  acquire h.mu before mutating h.cfg and use persistLocked so the
  mutation and the subsequent config write happen atomically from
  the perspective of readers (middleware goroutines). Previously
  the mutations raced with ModelACLMiddleware evaluating the same
  fields on concurrent requests. Matches the existing pattern in
  PutGeminiKeys / PatchGeminiKey. (HIGH review comment)

- config_lists.go: renameAPIKeyPolicy no longer reslices the
  existing APIKeyPolicies backing array via "[:0]". That was
  unsafe because SDKConfig.policyIndex caches *APIKeyPolicy into
  the same backing array; a concurrent reader resolving the cache
  mid-write would observe a partially-mutated entry. Allocate a
  fresh slice so any still-cached pointers keep pointing at the
  pre-mutation entries until InvalidatePolicyIndex forces a
  refresh. (MEDIUM review comment)

- model_acl.go: extractRequestedModel now peeks the first 16 KiB
  of the request body before committing to buffering the full cap.
  In realistic chat-completion payloads "model" is always near the
  top of the JSON, so the peek alone is usually sufficient and the
  middleware allocates ~16 KiB per request instead of up to
  modelACLMaxBodyBytes (10 MiB). When the peek does not contain
  "model", the middleware falls back to the prior bounded read and
  still enforces the cap. Under concurrency this is the difference
  between a bounded constant footprint and N*10 MiB. (MEDIUM
  review comment)

- model_acl_test.go: two new tests lock in peek-first behavior —
  one covers a large body with "model" in the peek (must extract,
  enforce, and preserve the full downstream body byte-for-byte),
  the other covers "model" past the peek but within the cap (must
  still extract via the fallback read).
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6bbea94c1d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/api/model_acl.go Outdated
The previous extractor for /v1beta/models/<model>:<action> paths
truncated at the first '/' inside <model>. That broke deployments
using force-model-prefix, where the routed model identifier is
literally "<prefix>/<model>" (e.g. "teamA/gemini-3-pro"). The
GeminiHandler forwards the whole segment-before-":" as the model,
and IsModelAllowedForKey already strips a single leading
"<prefix>/" before glob matching, so the ACL extractor must
mirror that and forward the full identifier including embedded
slashes. Otherwise:
  - allowed requests were rejected (the ACL saw "teamA" instead
    of "teamA/gemini-3-pro")
  - policies scoped to just the prefix segment could unintentionally
    allow unrelated models sharing the prefix

Added TestModelACLMiddleware_GeminiPathPreservesPrefix which uses
the same AllowedModels pattern against both an allowed and a
disallowed model under the same prefix — a regression would
surface as a false-positive on either the allow path or the
deny path.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8d34ee9200

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/api/handlers/management/config_lists.go Outdated
Comment thread internal/api/handlers/management/config_lists.go
Two P1s from the Codex reviewer on the prior commit:

1. Websocket ACL bypass
   /v1/responses accepts a websocket upgrade, and the model
   identifier is only selected later in frames consumed by the
   ResponsesWebsocket handler. extractRequestedModel returns
   found=false for the upgrade request, and the middleware was
   falling through as "no model here, allow", letting a restricted
   api key use any model through the websocket path.

   Fix: after identifying the key but before attempting body
   extraction, detect Upgrade: websocket and reject the upgrade
   when keyHasModelRestriction(cfg, apiKey) is true. A key is
   considered restricted when either it has an APIKeyPolicy entry
   with a non-empty AllowedModels list, or the deployment default
   is deny-all and the key has no matching policy. Unrestricted
   keys still pass through so the legacy websocket flow keeps
   working for them.

2. Unbounded drain after 413
   After detecting the body exceeded modelACLMaxBodyBytes, the
   code called io.Copy(io.Discard, c.Request.Body), which for a
   chunked/streamed request without a trustworthy Content-Length
   can hold the handler goroutine indefinitely — an attacker just
   keeps streaming bytes and the ACL check turns into a
   request-slot exhaustion path.

   Fix: drop the drain entirely. Return 413 and let net/http
   close the connection. No keep-alive concern since we're
   rejecting the request.

Tests added:
  - WebsocketUpgradeRejectedForRestrictedKey: key with
    AllowedModels attempting WS upgrade gets 403, handler never
    runs.
  - WebsocketUpgradeAllowedForUnrestrictedKey: key with no policy
    under allow-all default still gets through.
  - WebsocketUpgradeRejectedUnderDenyAllDefault: deny-all default
    rejects WS upgrades for keys with no matching policy.
  - OversizedBodyDoesNotDrainRemainder: uses a reader that blocks
    after its prefix to prove the middleware returns 413 promptly
    (within 2s) instead of reading to EOF.
Two P2s from the Codex reviewer, both rooted in the same legacy
edge case: APIKeys can hold duplicate values, and the policy
helpers (renameAPIKeyPolicy, removeAPIKeyPolicy) are global —
they rewrite or drop the policy row by value, not by index. That
opened two ways for a duplicate to survive a patch/delete without
its intended restriction:

1. PatchAPIKeys old/new path
   The loop broke at the first match. If APIKeys was
   ["sk-dup", "sk-other", "sk-dup"] and the caller did
   {"old":"sk-dup","new":"sk-new"}, one slot remained "sk-dup"
   while the policy migrated to "sk-new", so the leftover
   "sk-dup" fell back to the default policy (usually allow-all).

   Fix: remove the break — rename every duplicate so nothing is
   left behind that the global policy migration would orphan.

2. PatchAPIKeys index/value path
   Similar shape: rewriting one slot while the old value still
   appears elsewhere, then moving the policy to the new value,
   left the surviving duplicate unrestricted.

   Fix: only migrate the policy when the old value no longer
   appears anywhere in APIKeys. When duplicates remain, the
   policy stays attached to the old value; the new value falls
   back to the default until the caller sets an explicit policy
   via PUT /api-keys.

3. DeleteAPIKeys indexed path
   Same pattern — removed one slot, then dropped the policy row
   unconditionally, which orphaned surviving duplicates.

   Fix: only drop the policy when no other entries still carry
   the removed value.

Added apiKeyValuePresent() helper so the three call sites share
one membership check.

Tests added:
  - PatchAPIKeys_RenameViaOldNewRenamesAllDuplicates: three-slot
    fixture with two duplicates proves all matching slots are
    renamed and the migrated policy still applies.
  - PatchAPIKeys_IndexedReplaceLeavesPolicyWhenDuplicateRemains:
    two-duplicate fixture proves the surviving slot keeps its
    policy (asserts both positively — still allows the permitted
    model — and negatively — still rejects models outside the
    allowlist).
  - DeleteAPIKeys_ByIndexKeepsPolicyWhenDuplicateRemains: same
    shape for the delete path.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e48e63fb12

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/api/model_acl.go
mtuckerb and others added 2 commits April 20, 2026 08:13
Codex/ChatGPT reasoning-model upstreams (the gpt-5.x family) emit assistant
text only via response.output_text.delta events; the final response.completed
event has no output item with content[].type == "output_text". The codex
executor's non-streaming path filtered out every line except response.completed
and passed it straight to the chat-completions translator, which then surfaced
a message with content == null while completion_tokens was non-zero. Clients
that read .choices[0].message.content saw the field as null and rendered
fallback text such as "Something went wrong." even though the model had
generated the requested response.

Fix: walk the SSE body once, accumulate text from response.output_text.delta
events, and synthesize a message item containing the accumulated text into
response.output[] before passing the event to the translator. The accumulator
is a no-op for non-reasoning models that already populate the message item
themselves (codexResponseHasOutputText guards against double-injection).

Tests cover codexResponseHasOutputText across the empty/missing/populated
shapes, injectCodexOutputText for fresh and pre-populated output arrays, and
a representative end-to-end gpt-5.x SSE body to verify the loop accumulates
the deltas and produces the expected post-injection event.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@luispater luispater left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Adds per-API-key model allowlisting (glob matching) with management API support for structured allowedModels payloads, and enforces it via a new ModelACLMiddleware on /v1 and /v1beta.

Blocking

  • Alias bypass: The middleware is wired on /v1 and /v1beta only. The /backend-api/codex/* alias group also uses AuthMiddleware but is not protected, so restricted keys can bypass allowlists by calling the alias endpoints.
  • Body buffering on non-JSON: extractRequestedModel reads and buffers the full request body for any POST/PUT/PATCH to look for "model". When ACLs are enabled, this will also buffer non-JSON/multipart requests like POST /v1/images/edits, which can be large. Suggest gating body reads to JSON content-types (or a small allowlist of JSON routes), and/or extracting model from multipart form fields for image edit requests.
  • CI: retarget is failing because a separate PR (#2911) from the same head branch cannot be retargeted to dev (duplicate base/head). This likely needs the duplicate PR cleaned up so the check goes green.

Non-blocking

  • GET /v0/management/api-keys returns api-key-policies using allowed-models (kebab-case). If the dashboard expects allowedModels, consider adding an alias or documenting the response shape.
  • Minor nit: the test case name "empty plain array fails" expects success (empty clears keys).

Test plan

  • CI build
  • No local checkout/tests run (review-only via gh as requested).

@sccgaochengyi
Copy link
Copy Markdown

We actively need this feature. Per-key model allowlisting is exactly what we've been looking for to enforce user-tier access controls (e.g. restricting intern/contractor keys to deepseek-/glm-/qwen* while granting full-time employees broader access).

The implementation looks solid — the middleware-based enforcement, glob matching, atomic policy cache, and backward-compatible payload formats all align well with how we'd want to consume this.

That said, I'd echo the two blocking items from @luispater's review:

  1. Codex route bypass — wiring ModelACLMiddleware onto /backend-api/codex/* is critical; without it the allowlist is trivially circumvented.
  2. /v1/models filtering — returning the full model list to a restricted key is a poor UX (users see models they can't use). Ideally the models endpoint would filter based on the calling key's policy.

Both seem like small deltas on top of this PR. Hoping we can get these addressed and land this soon.

Three blockers + two nits from luispater's CHANGES_REQUESTED review on
PR router-for-me#2914:

server.go (BLOCKING — alias bypass):
- /backend-api/codex/* uses AuthMiddleware but had no ModelACLMiddleware,
  so a key restricted by api-key-policies could bypass the allowlist by
  routing through the codex alias instead of /v1/responses. Added
  ModelACLMiddleware to the codexDirect group.

model_acl.go (BLOCKING — non-JSON body buffering):
- extractRequestedModel previously buffered any POST/PUT/PATCH body up
  to 10 MiB to look for a JSON "model" field. With ACLs enabled this
  also buffered non-JSON uploads (image edits, future binary endpoints)
  even when no JSON model field could exist.
- Dispatch on Content-Type:
    - application/json (and *+json, missing CT) → existing peek+buffer
      path with the 10 MiB cap.
    - multipart/form-data → ParseMultipartForm with a 1 MiB in-memory
      cap; large file parts spool to temp disk per stdlib. This recovers
      enforcement on /v1/images/edits without ballooning resident memory.
    - everything else → no body inspection, allow through.
- ParseMultipartForm is idempotent, so downstream handlers using
  c.MultipartForm() / c.PostForm() still see the parsed form.

config (non-blocking nit — response shape):
- APIKeyPolicy.AllowedModels JSON tag: allowed-models → allowedModels
  to match what the dashboard sends on PUT. Input parser still accepts
  all three case styles (camel, kebab, snake) for backward compat.

tests (non-blocking nit + new coverage):
- api_keys_payload_test.go: rename "empty plain array fails" →
  "empty plain array clears keys" (the case asserts wantOK:true).
- model_acl_test.go: 8 new tests:
    - CodexAliasDisallowedModelRejected / AllowedModelPasses
    - MultipartFormDataDisallowedModelRejected / AllowedModelPasses
    - MultipartWithoutModelFieldAllowed
    - NonJSONContentTypeAllowedThrough (10 MiB octet-stream → 200,
      proves the middleware no longer buffers non-JSON bodies)
    - JSONWithCharsetParameterStillInspected
    - VendorJSONContentTypeInspected (application/vnd.api+json)

go test ./internal/api/... ./internal/config/... is green; the only
failing test on this branch (TestCodexStaticModelsIncludeGPT55) also
fails on pristine main and is unrelated.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9f805087c8

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/api/model_acl.go Outdated
mtuckerb and others added 2 commits April 28, 2026 13:10
@mtuckerb
Copy link
Copy Markdown
Author

Addressed the remaining reviewer-facing gap around model-list UX. Restricted API keys now get policy-filtered model listings for both OpenAI/Claude-shaped /v1/models and Gemini-shaped /v1beta/models, while unrestricted keys retain the full catalog and deny-all/no-policy keys receive an empty filtered list.

Also added focused coverage for:

  • restricted keys only seeing allowed models
  • unrestricted keys seeing all models
  • deny-all unknown keys seeing no models
  • /v1/models response filtering behavior

Verification:

  • go test ./internal/api
  • go test ./...
  • GitHub checks are green (build, close-when-agents-md-changed, ensure-no-translator-changes).

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1d8305cef2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/api/handlers/management/config_lists.go
Two new reviewer-flagged gaps after the prior round:

1. P1 — Content-Type-based bypass of model ACL.
   Several model-executing routes (ChatCompletions, Completions, Responses,
   ClaudeMessages) parse the request body as JSON regardless of
   Content-Type. The previous default-allow branch in
   extractRequestedModel let a restricted key send Content-Type: text/plain
   with a JSON body containing a disallowed model and bypass enforcement.
   Restructure the dispatch so only multipart/form-data takes the dedicated
   form-parsing path; every other content type — including text/plain,
   application/octet-stream, and any other non-JSON value — runs the same
   peek+gjson inspection as application/json. Bodies that genuinely have
   no JSON model field (true binary uploads etc.) still pass through, and
   the existing oversize cap continues to limit per-request memory.

2. P2 — Duplicate-key silent-skip in PUT /api-keys structured payload.
   parseAPIKeysPayload accepted [{key:'sk-x'},{key:'sk-x',allowedModels:[...]}]
   and silently kept only the first row, dropping the restriction on the
   duplicate. That is a quiet permission widening. Reject duplicate
   structured keys with ok=false (HTTP 400 from PutAPIKeys) so callers
   notice and fix the payload instead of getting an unintended fall-back
   to the default policy.

Tests:

- TestModelACLMiddleware_NonJSONContentTypeWithJSONBodyStillEnforces:
  Content-Type: text/plain + JSON body with disallowed model returns 403.
- TestModelACLMiddleware_OctetStreamWithoutModelAllowed: opaque
  octet-stream body without a JSON model field still returns 200 from the
  middleware.
- TestModelACLMiddleware_NonJSONContentTypeAllowedThrough updated to the
  new contract — small opaque body returns 200; the prior 'large opaque
  payload returns 200' assertion was the exact bypass this PR closes.
- TestParseAPIKeysPayload_StructuredDuplicateKeyRejected: duplicate
  structured rows are rejected.

go test ./... passes.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9369170b9f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/api/model_acl.go
@mtuckerb
Copy link
Copy Markdown
Author

mtuckerb commented May 2, 2026

Addressed the two new bot-flagged gaps from the latest pass (commit 9369170).

P1 — Content-Type bypass (codex, model_acl.go):
Several model-executing routes (ChatCompletions, Completions, Responses, ClaudeMessages) parse the body as JSON regardless of Content-Type, so the previous default-allow branch let a restricted key send Content-Type: text/plain with a JSON body containing a disallowed model and skip enforcement. The Content-Type switch now only branches off the JSON path for content types we explicitly know are not JSON (multipart/form-data). Everything else — including missing, unparseable, or misleading Content-Type values — runs the same peek+gjson inspection as application/json. Bodies without a JSON model field still pass through (gjson returns ok=false), and the existing 10 MiB cap continues to bound per-request memory.

P2 — Silent duplicate-key dedupe in PUT /api-keys (codex, config_lists.go):
parseAPIKeysPayload was silently skipping duplicate structured rows, which let payloads like [{"key":"sk-x"},{"key":"sk-x","allowedModels":["gpt-4o*"]}] widen access by dropping the restricted row. Duplicates are now rejected (HTTP 400). Legacy plain-list inputs still go through dedupeKeyList because they have no per-row metadata to lose.

Tests:

  • TestModelACLMiddleware_NonJSONContentTypeWithJSONBodyStillEnforces — Content-Type: text/plain + JSON body with disallowed model → 403.
  • TestModelACLMiddleware_OctetStreamWithoutModelAllowed — opaque octet-stream body without a JSON model field → 200.
  • TestModelACLMiddleware_NonJSONContentTypeAllowedThrough — kept the test name (intent is right) but rewrote the body to a small 4 KiB opaque buffer; the prior 10 MiB-payload-returns-200 assertion was the exact bypass this round closes.
  • TestParseAPIKeysPayload_StructuredDuplicateKeyRejected — duplicate structured rows → ok=false.

Verification:

  • go test ./internal/api/...
  • go test ./...
  • GitHub checks green (build, close-when-agents-md-changed, ensure-no-translator-changes).

@mtuckerb
Copy link
Copy Markdown
Author

mtuckerb commented May 2, 2026

Round-4 fixes pushed at commit 9369170b9fac499c8846ebba42cccf3072a94371 addressing the two new codex findings against the prior commit:

P1 — Content-Type bypass of model ACL (model_acl.go)
The previous switch had a default: return "", false, nil arm for unknown content types. Several model-executing handlers (ChatCompletions, Completions, Responses, ClaudeMessages) parse the body via c.GetRawData() and decode JSON regardless of Content-Type, so a restricted key could send Content-Type: text/plain with a JSON body containing a disallowed model and bypass enforcement. The dispatch now branches off the JSON path only for multipart/form-data; everything else — including missing, unparseable, or deliberately misleading Content-Type values — runs the same peek + gjson inspection as application/json. Bodies that genuinely contain no JSON model field still pass through (gjson returns ok=false), and the existing 10 MiB cap continues to bound per-request memory.

P2 — Silent duplicate-key dedupe in PUT /api-keys structured payload (config_lists.go)
parseAPIKeysPayload previously skipped duplicate keys in the structured form, which silently dropped allowedModels from a later duplicate row and let the key fall back to the deployment default policy (usually allow-all). Duplicates now return ok=false → HTTP 400, so callers fix their payload instead of getting an invisible permission widening. The legacy plain-string form continues to dedupe via dedupeKeyList because it carries no per-row metadata to lose.

Tests

  • TestModelACLMiddleware_NonJSONContentTypeWithJSONBodyStillEnforces: text/plain + JSON body with disallowed model → 403.
  • TestModelACLMiddleware_OctetStreamWithoutModelAllowed: opaque octet-stream body without a JSON model field still returns 200.
  • TestModelACLMiddleware_NonJSONContentTypeAllowedThrough updated to the new contract — small opaque body returns 200; the prior "large opaque payload returns 200" assertion was the exact bypass this round closes.
  • TestParseAPIKeysPayload_StructuredDuplicateKeyRejected: duplicate structured rows are rejected.

Verification: go test ./... passes; PR checks (build, close-when-agents-md-changed, ensure-no-translator-changes) are green.

Red-team review of commit 9369170 probed five edge cases the round-4 unit tests did not already cover. All pass against the current implementation; pinning them down so a future regression in either the Content-Type dispatch or the duplicate-key parser surfaces as a test failure rather than a silent ACL bypass.

model_acl_test.go: EmptyContentTypeWithDisallowedJSONStillEnforces, UnparseableContentTypeStillEnforces, MixedCaseContentTypeStillDispatches.

api_keys_payload_test.go: StructuredWrappedDuplicateKeyRejected, StructuredDuplicateKeyAfterTrimRejected.

go test ./internal/api/... ./internal/api/handlers/management/... passes.
@mtuckerb
Copy link
Copy Markdown
Author

mtuckerb commented May 2, 2026

Red-team self-review of 9369170b shipped at e0e8d1f9. No new blocking findings; added five adversarial test cases that the round-4 unit tests didn't cover.

Coverage added:

  • TestModelACLMiddleware_EmptyContentTypeWithDisallowedJSONStillEnforces: no Content-Type header + JSON body with disallowed model → 403. Catches a regression of parseRequestMediaType("") → "" ever short-circuiting back to fail-open.
  • TestModelACLMiddleware_UnparseableContentTypeStillEnforces: malformed Content-Type that mime.ParseMediaType rejects also lands in the default branch and still enforces.
  • TestModelACLMiddleware_MixedCaseContentTypeStillDispatches: APPLICATION/JSON survives ToLower. A future refactor that drops the lowercase step surfaces here instead of as a silent ACL bypass.
  • TestParseAPIKeysPayload_StructuredWrappedDuplicateKeyRejected: duplicate keys in the {items:[...]} wrapped structured form get rejected — same loop as the bare array form, but pin the wrapped form too because both shapes go through it.
  • TestParseAPIKeysPayload_StructuredDuplicateKeyAfterTrimRejected: whitespace-padded duplicates are also rejected; trim-then-dedupe is correctly ordered.

Pre-existing concerns I noticed but they're not in scope for this PR — flagging for awareness:

  • No global MaxBytesReader on the multipart code path. The 1 MiB modelACLMultipartMemoryCap is the in-memory portion of the form parse; mime/multipart will happily disk-spool larger file fields. Predates this PR.
  • extractModelFromMultipartBody allow-throughs on a malformed multipart body. Intentional — the route handler should reject — but worth keeping in mind if the multipart route surface grows.

Tests pass against the new commit; PR checks are green.

mtuckerb added a commit to mtuckerb/CLIProxyAPI that referenced this pull request May 2, 2026
* fix(executor): fix OAuth extra usage detection by Anthropic API

Three changes to avoid Anthropic's content-based system prompt validation:

1. Fix identity prefix: Use 'You are Claude Code, Anthropic's official CLI
   for Claude.' instead of the SDK agent prefix, matching real Claude Code.

2. Move user system instructions to user message: Only keep billing header +
   identity prefix in system[] array. User system instructions are prepended
   to the first user message as <system-reminder> blocks.

3. Enable cch signing for OAuth tokens by default: The xxHash64 cch integrity
   check was previously gated behind experimentalCCHSigning config flag.
   Now automatically enabled when using OAuth tokens.

Related: router-for-me#2599

* fix(executor): inject full Claude Code system prompt blocks with proper cache scopes

Previous fix only injected billing header + agent identifier (2 blocks).
Anthropic's updated detection now validates system prompt content depth:
- Block count (needs 4-6 blocks, not 2)
- Cache control scopes (org for agent, global for core prompt)
- Presence of known Claude Code instruction sections

Changes:
- Add claude_system_prompt.go with extracted Claude Code v2.1.63 system prompt
  sections (intro, system instructions, doing tasks, tone & style, output efficiency)
- Rewrite checkSystemInstructionsWithSigningMode to build 5 system blocks:
  [0] billing header (no cache_control)
  [1] agent identifier (cache_control: ephemeral, scope=org)
  [2] core intro prompt (cache_control: ephemeral, scope=global)
  [3] system instructions (no cache_control)
  [4] doing tasks (no cache_control)
- Third-party client system instructions still moved to first user message

Follow-up to 69b950d

* fix: use sjson to build system blocks, avoid raw newlines in JSON

The previous commit used fmt.Sprintf with %s to insert multi-line string
constants into JSON strings. Go raw string literals contain actual newline
bytes, which produce invalid JSON (control characters in string values).

Replace with buildTextBlock() helper that uses sjson.SetBytes to properly
escape text content for JSON serialization.

* fix: buildTextBlock cache_control sjson path issue

sjson treats 'cache_control.type' as nested path, creating
{ephemeral: {scope: org}} instead of {type: ephemeral, scope: org}.
Pass the whole map to sjson.SetBytes as a single value.

* fix: build cache_control JSON manually to avoid sjson map marshaling

* fix: remove invalid org scope and match Claude Code block layout

* fix(claude): sanitize forwarded third-party prompts for OAuth cloaking

Only for Claude OAuth requests, sanitize forwarded system-prompt context before
it is prepended into the first user message. This preserves neutral task/tool
instructions while removing OpenCode branding, docs links, environment banners,
and product-specific workflow sections that still triggered Anthropic extra-usage
classification after top-level system[] cloaking.

* fix(claude): remove invalid cache_control scope from static system block

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

* fix(claude): reduce forwarded OAuth prompt to minimal tool reminder

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

* fix(auth, executor): normalize Qwen base URL, adjust RefreshLead duration, and add tests

* fix(claude): remap OAuth tool names to Claude Code style to avoid third-party fingerprint detection

A/B testing confirmed that Anthropic uses tool name fingerprinting to detect
third-party clients on OAuth traffic. OpenCode-style lowercase names like
'bash', 'read', 'todowrite' trigger extra-usage billing, while Claude Code
TitleCase names like 'Bash', 'Read', 'TodoWrite' pass through normally.

Changes:
- Add oauthToolRenameMap: maps lowercase tool names to Claude Code equivalents
- Add oauthToolsToRemove: removes 'question' and 'skill' (no Claude Code counterpart)
- remapOAuthToolNames: renames tools, removes blacklisted ones, updates tool_choice and messages
- reverseRemapOAuthToolNames/reverseRemapOAuthToolNamesFromStreamLine: reverse map for responses
- Apply in Execute(), ExecuteStream(), and CountTokens() for OAuth token requests

* fix(auth): preserve and restore ready view cursors during index rebuilds

* fix(handlers): add base URL validation and improve API key deletion tests

* feat(antigravity): configurable signature cache with bypass-mode validation

Antigravity 的 Claude thinking signature 处理新增 cache/bypass 双模式,
并为 bypass 模式实现按 SIGNATURE-CHANNEL-SPEC.md 的签名校验。

新增 antigravity-signature-cache-enabled 配置项(默认 true):
- cache mode(true):使用服务端缓存的签名,行为与原有逻辑完全一致
- bypass mode(false):直接使用客户端提供的签名,经过校验和归一化

支持配置热重载,运行时可切换模式。

校验流程:
1. 剥离历史 cache-mode 的 'modelGroup#' 前缀(如 claude#Exxxx → Exxxx)
2. 首字符必须为 'E'(单层编码)或 'R'(双层编码),否则拒绝
3. R 开头:base64 解码 → 内层必须以 'E' 开头 → 继续单层校验
4. E 开头:base64 解码 → 首字节必须为 0x12(Claude protobuf 标识)
5. 所有合法签名归一化为 R 形式(双层 base64)发往 Antigravity 后端

非法签名处理策略:
- 非严格模式(默认):translator 静默丢弃无签名的 thinking block
- 严格模式(antigravity-signature-bypass-strict: true):
  executor 层在请求发往上游前直接返回 HTTP 400

按 SIGNATURE-CHANNEL-SPEC.md 解析 Claude 签名的完整 protobuf 结构:
- Top-level Field 2(容器)→ Field 1(渠道块)
- 渠道块提取:channel_id (Field 1)、infrastructure (Field 2)、
  model_text (Field 6)、field7 (Field 7)
- 计算 routing_class、infrastructure_class、schema_features
- 使用 google.golang.org/protobuf/encoding/protowire 解析

- resolveThinkingSignature 拆分为 resolveCacheModeSignature / resolveBypassModeSignature
- hasResolvedThinkingSignature:mode-aware 签名有效性判断
  (cache: len>=50 via HasValidSignature,bypass: non-empty)
- validateAntigravityRequestSignatures:executor 预检,
  仅在 bypass + strict 模式下拦截非法签名返回 400
- 响应侧签名缓存逻辑与 cache mode 集成
- Cache mode 行为完全保留:无 '#' 前缀的原生签名静默丢弃

* docs(antigravity): document signature validation spec alignment

Add package-level comment documenting the protobuf tree structure,
base64 encoding equivalence proof, output dimensions, and spec
section references. Remove unreachable legacy_vertex_group dead code.

* fix(antigravity): refine 429 handling and credits fallback

Includes: restore SDK docs under docs/; update antigravity executor credits tests; gofmt.

* fix(claude): preserve OAuth tool renames when filtering tools

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

* fix(claude): map question/skill to TitleCase instead of removing them

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

* fix(claude): address PR review feedback for OAuth cloaking

- Use buildTextBlock for billing header to avoid raw JSON string interpolation
- Fix empty array edge case in prependToFirstUserMessage
- Allow remapOAuthToolNames to process messages even without tools array
- Move claude_system_prompt.go to helps/ per repo convention
- Export prompt constants (ClaudeCode* prefix) for cross-package access

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

* fix(handlers): update listener to bind on all interfaces instead of localhost

Fixed: router-for-me#2640

* feat(antigravity): prefer prod URL as first priority

Promote cloudcode-pa.googleapis.com to the first position in the
fallback order, with daily and sandbox URLs as fallbacks.

* fix(executor): implement immediate retry with token refresh on 429 for Qwen and add associated tests

Closes: router-for-me#2661

* fix(executor): handle OAuth tool name remapping with rename detection and add tests

Closes: router-for-me#2656

* docs: refine CLIproxyAPI Quota Inspector description in all README locales

* docs: fix CLIProxyAPI Quota Inspector naming and link casing

* refactor(executor): remove immediate retry with token refresh on 429 for Qwen and update tests accordingly

* docs: clarify codex quota window wording in README locales

* fix(executor): handle 429 Retry-After header and default retry logic for quota exhaustion

- Added proper parsing of `Retry-After` headers for 429 responses.
- Set default retry duration when "disable cooling" is active on quota exhaustion.
- Updated tests to verify `Retry-After` handling and default behavior.

* fix(antigravity): allow 32MB bypass signatures

Raise the local bypass-signature ceiling so long Claude thinking signatures are not rejected before request translation, and keep the oversized-signature test cheap to execute.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

* fix(antigravity): reduce bypass mode log noise

Keep cache-disable visibility at info level while suppressing duplicate state-change logs and moving strict-mode chatter down to debug.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

* feat(auth): implement auto-refresh loop for managing auth token schedule

- Introduced `authAutoRefreshLoop` to handle token refresh scheduling.
- Replaced semaphore-based refresh logic in `Manager` with the new loop.
- Added unit tests to verify refresh schedule logic and edge cases.

* fix(antigravity): drop redacted thinking blocks with empty text

Antigravity wraps empty thinking text into a prompt-caching-scope
object that omits the required inner "thinking" field, causing 400
"messages.N.content.0.thinking.thinking: Field required" when Claude
Max requests are routed through Antigravity in bypass mode.

* fix(antigravity): skip full schema cleanup for empty tool requests

Avoid whole-payload schema sanitization when translated Antigravity requests have no actual tool schemas, including missing and empty tools arrays. Add regression coverage so image-heavy no-tool requests keep bypassing the old memory amplification path.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

* refactor(auth): simplify auth directory scanning and improve JSON processing logic

- Replaced `filepath.Walk` with `os.ReadDir` for cleaner directory traversal.
- Fixed `isAuthJSON` check to use `filepath.Dir` for directory comparison.
- Updated auth hash cache generation and file synthesis to improve readability and maintainability.

* feat(auth): add configurable worker pool size for auto-refresh loop

- Introduced `auth-auto-refresh-workers` config option to override default concurrency.
- Updated `authAutoRefreshLoop` to support customizable worker counts.
- Enhanced token refresh scheduling flexibility by aligning worker pool with runtime configurations.

* fix(antigravity): strip thinking blocks with empty signatures instead of rejecting

Thinking blocks with empty signatures come from proxy-generated
responses (Antigravity/Gemini routed as Claude). These should be
silently dropped from the request payload before forwarding, not
rejected with 400. Fixes 10 "missing thinking signature" errors.

* fix(antigravity): discard thinking blocks with non-Claude-format signatures

Proxy-generated thinking blocks may carry hex hashes or other non-Claude
signatures (e.g. "d5cb9cd0823142109f451861") from Gemini responses. These
are now discarded alongside empty-signature blocks during the strip phase,
before validation runs. Valid Claude signatures always start with 'E' or 'R'
(after stripping any cache prefix).

* fix(antigravity): use E-prefixed fake signature in strict bypass test

The strict bypass test used testGeminiSignaturePayload() which produces
a base64 string starting with 'C'. Since StripInvalidSignatureThinkingBlocks
now strips all non-E/R signatures unconditionally, the test payload was
stripped before reaching ValidateClaudeBypassSignatures, causing the test
to pass the request through instead of rejecting it with 400.

Replace with testFakeClaudeSignature() which produces a base64 string
starting with 'E' (valid at the lightweight check) but with invalid
protobuf content (no valid field 2), so strict mode correctly rejects
it at the deep validation layer.

* fix(antigravity): cap maxOutputTokens using registry max_completion_tokens

Claude models on antigravity have a 64000 token output limit but
max_tokens from downstream requests was passed through uncapped,
causing 400 INVALID_ARGUMENT from Google when clients sent 128000.

* chore: remove Qwen support from SDK and internal components

- Deleted `QwenAuthenticator`, internal `qwen_auth`, and `qwen_executor` implementations.
- Removed all Qwen-related OAuth flows, token handling, and execution logic.
- Cleaned up dependencies and references to Qwen across the codebase.

* chore(models): remove outdated GPT-5 and related model entries from registry JSON

* feat(session-affinity): add session-sticky routing for multi-account load balancing

When multiple auth credentials are configured, requests from the same
session are now routed to the same credential, improving upstream prompt
cache hit rates and maintaining context continuity.

Core components:
- SessionAffinitySelector: wraps RoundRobin/FillFirst selectors with
  session-to-auth binding; automatic failover when bound auth is
  unavailable, re-binding via the fallback selector for even distribution
- SessionCache: TTL-based in-memory cache with background cleanup
  goroutine, supporting per-session and per-auth invalidation
- StoppableSelector interface: lifecycle hook for selectors holding
  resources, called during Manager.StopAutoRefresh()

Session ID extraction priority (extractSessionIDs):
1. metadata.user_id with Claude Code session format (old
   user_{hash}_session_{uuid} and new JSON {session_id} format)
2. X-Session-ID header (generic client support)
3. metadata.user_id (non-Claude format, used as-is)
4. conversation_id field
5. Stable FNV hash from system prompt + first user/assistant messages
   (fallback for clients with no explicit session ID); returns both a
   full hash (primaryID) and a short hash without assistant content
   (fallbackID) to inherit bindings from the first turn

Multi-format message hash covers OpenAI messages, Claude system array,
Gemini contents/systemInstruction, and OpenAI Responses API input items
(including inline messages with role but no type field).

Configuration (config.yaml routing section):
- session-affinity: bool (default false)
- session-affinity-ttl: duration string (default "1h")
- claude-code-session-affinity: bool (deprecated, alias for above)
All three fields trigger selector rebuild on config hot reload.

Side effect: Idempotency-Key header is no longer auto-generated with a
random UUID when absent — only forwarded when explicitly provided by the
client, to avoid polluting session hash extraction.

* fix(antigravity): strip billing header from system instruction before upstream call

The x-anthropic-billing-header block in the Claude system array is
client-internal metadata and should not be forwarded to the Gemini
upstream as part of systemInstruction.parts.

* fix(docker-build): improve argument handling and error messaging for usage option

* fix(util): forward custom Host header to upstream

Custom headers configured under openai-compatibility (and any other
provider passing through applyCustomHeaders) were silently dropped for
the Host key, because Go's net/http reads the wire Host from
req.Host, not req.Header["Host"]. As a result, virtual-host routed
upstreams (e.g. LiteLLM behind an ingress) saw the base-url's host
instead of the user-configured override and returned 404.

Detect the Host key with http.CanonicalHeaderKey and assign it to
req.Host so it is actually written on the wire. Other headers continue
to use Header.Set as before.

Fixes router-for-me#2833

* fix(handlers): include execution session metadata and skip idempotency key when absent

- Refactored `requestExecutionMetadata` to handle empty `Idempotency-Key` gracefully.
- Added test to validate metadata inclusion of execution session without idempotency key.

* feat(auth): add proxy URL override support to auth constructors and executors

- Introduced `WithProxyURL` variants for `CodexAuth`, `ClaudeAuth`, `IFlowAuth`, and `DeviceFlowClient`.
- Updated executors to use proxy-aware constructors for improved configurability.
- Added unit tests to validate proxy override precedence and functionality.

Closes: router-for-me#2823

* chore: remove iFlow-related modules and dependencies

- Deleted `iflow` provider implementation, including thinking configuration (`apply.go`) and authentication modules.
- Removed iFlow-specific tests, executors, and helpers across SDK and internal components.
- Updated all references to exclude iFlow functionality.

* feat(models): add Claude Opus 4.7 model entry to registry JSON

* fix(tests): update model lookup references and enhance Claude executor tests

* fix(tests): update Gemini family test case numbers for consistency

* fix(util): also keep Host in header map for synthetic requests

Addressing the P1 note from the Codex reviewer: applyCustomHeaders is
also called with a synthetic &http.Request{Header: ...} from the
websockets executors (aistudio_executor.go, codex_websockets_executor.go),
which forward only the header map. The previous continue meant a custom
Host was dropped from that map, regressing virtual-host overrides on
those flows. Mirror the value to both r.Host (for real net/http) and
r.Header (for header-map-only consumers).

* feat(api): integrate auth index into key retrieval endpoints for Gemini, Claude, Codex, OpenAI, and Vertex

* fix(management): stabilize auth-index mapping

* fix(tests): remove obsolete config_auth_index_test file

* feat(translator): add partial and full image generation support in Codex-GPT and Codex-Gemini flows

- Introduced `LastImageHashByItemID` in Codex-GPT and `LastImageHashByID` in Codex-Gemini for deduplication of generated images.
- Added support for handling `partial_image` and `image_generation_call` types, with inline data embedding for Gemini and URL payload conversion for GPT.
- Extended unit tests to verify image handling in both streaming and non-streaming modes.

* fix(executor): drop obsolete context-1m-2025-08-07 beta header (fixes router-for-me#2866)

Anthropic has moved the 1M-context-window feature to General Availability,
so the context-1m-2025-08-07 beta flag is no longer accepted and now causes
400 Bad Request errors when forwarded upstream.

Remove the X-CPA-CLAUDE-1M detection and the corresponding injection of the
now-invalid beta header.  Also drop the unused net/textproto import that was
only needed for the header-key lookup.

* feat(docs): add VisionCoder sponsorship details and optimize external links

- Added VisionCoder sponsorship information to `README.md`, `README_CN.md`, and `README_JA.md`.
- Updated external links to include `target="_blank"` for improved user experience.
- Added new logo asset `visioncoder.png` for README use.

* feat(access): per-api-key model allowlist with glob matching

Adds enforcement for per-client-key model ACLs at the request middleware
layer. The dashboard already stores allowedModels per UserApiKey and
attempts to push them to PUT /v0/management/api-keys, but upstream until
now had nowhere to land them — the structured payload was rejected and
the dashboard's fallback silently dropped the policy data, so no key was
ever actually restricted.

config:
- Add APIKeyPolicy{Key, AllowedModels} and SDKConfig.APIKeyPolicies
- Add SDKConfig.APIKeyDefaultPolicy ("allow-all" | "deny-all"; defaults
  to allow-all so existing deployments are unaffected)
- IsModelAllowedForKey supports path.Match globs and tolerates a single
  leading "<prefix>/" segment so policies stay portable across prefixed
  credentials

management API:
- PutAPIKeys now accepts both the legacy string list and the structured
  [{key, allowedModels}] form, with allowedModels / allowed-models /
  allowed_models all recognized so existing dashboard payloads work
- GetAPIKeys exposes api-key-policies when any are configured

middleware:
- New ModelACLMiddleware extracts the targeted model from the JSON body
  ("model" field) or the Gemini-style /v1beta/models/<model>:<action>
  path, then enforces the per-key policy. The body is read once and
  restored via io.NopCloser so downstream handlers see the original
  request bytes
- Wired into both /v1 and /v1beta groups directly after AuthMiddleware
- Bypasses any request that has no identified api key (preserving the
  legacy "no auth provider configured => allow" behavior of
  AuthMiddleware) and any request shape with no extractable model
  (listings, websocket upgrades) so discovery endpoints keep working

tests:
- Unit tests for IsModelAllowedForKey covering allow-all, deny-all,
  empty allowed list semantics, exact matches, glob matches, and the
  prefixed-credential case
- Middleware tests covering the allow / deny / unknown-key / Gemini
  path / list-endpoint / body-pass-through paths
- Parser tests for the management PutAPIKeys payload covering the
  legacy plain list, the wrapped items form, and the structured form
  in all three case styles

* fix(access): address review feedback on per-key model ACL

Three changes landed after PR router-for-me#2911 review:

middleware (DoS hardening):
- Cap request body inspected by ModelACLMiddleware at 10 MiB via
  io.LimitReader; requests above the cap are rejected with HTTP 413
  ("request_too_large") instead of being buffered into memory
- Short-circuit on Content-Length when the client declares an oversized
  body, so we bail before reading any bytes
- Any non-size read error now fails closed with HTTP 400 rather than
  silently allowing the request past the policy check
- New tests cover both the streamed-oversize and Content-Length paths

config (O(1) policy lookup + cache invalidation):
- SDKConfig.APIKeyPolicies is now backed by an atomic.Pointer cache
  (map[string]*APIKeyPolicy) so IsModelAllowedForKey is O(1) per call
  instead of O(N) linear scan; the cache is lazily rebuilt via
  CompareAndSwap and safe under concurrent reads
- Add SetAPIKeyPolicies (atomic replace + invalidate) and
  InvalidatePolicyIndex (manual flush after in-place edits)
- Tests exercise mutation visibility, nil-safe receivers, concurrent
  access, and defensive-copy semantics

management handlers (policy rotation/deletion):
- PatchAPIKeys gains a dedicated handler that keeps APIKeyPolicies in
  sync on rename (via old/new) and rotation (via index/value); when the
  target key already has a policy the source row is dropped so there
  are no duplicate entries
- DeleteAPIKeys removes the matching APIKeyPolicy rows so a revoked
  credential cannot leave its allowlist behind; slice is nilled when
  empty
- PutAPIKeys routes through SetAPIKeyPolicies so the cache is refreshed
  atomically with the slice swap
- Tests cover rename-with-existing-target, append-new, delete-by-value,
  delete-by-index, last-row-clears, and PUT-rebuilds-policies

No behavior change for existing deployments that do not configure
api-key-policies. All existing tests still pass.

* fix(access): address gemini-code-assist review round 2

- config_lists.go: PutAPIKeys, PatchAPIKeys, and DeleteAPIKeys now
  acquire h.mu before mutating h.cfg and use persistLocked so the
  mutation and the subsequent config write happen atomically from
  the perspective of readers (middleware goroutines). Previously
  the mutations raced with ModelACLMiddleware evaluating the same
  fields on concurrent requests. Matches the existing pattern in
  PutGeminiKeys / PatchGeminiKey. (HIGH review comment)

- config_lists.go: renameAPIKeyPolicy no longer reslices the
  existing APIKeyPolicies backing array via "[:0]". That was
  unsafe because SDKConfig.policyIndex caches *APIKeyPolicy into
  the same backing array; a concurrent reader resolving the cache
  mid-write would observe a partially-mutated entry. Allocate a
  fresh slice so any still-cached pointers keep pointing at the
  pre-mutation entries until InvalidatePolicyIndex forces a
  refresh. (MEDIUM review comment)

- model_acl.go: extractRequestedModel now peeks the first 16 KiB
  of the request body before committing to buffering the full cap.
  In realistic chat-completion payloads "model" is always near the
  top of the JSON, so the peek alone is usually sufficient and the
  middleware allocates ~16 KiB per request instead of up to
  modelACLMaxBodyBytes (10 MiB). When the peek does not contain
  "model", the middleware falls back to the prior bounded read and
  still enforces the cap. Under concurrency this is the difference
  between a bounded constant footprint and N*10 MiB. (MEDIUM
  review comment)

- model_acl_test.go: two new tests lock in peek-first behavior —
  one covers a large body with "model" in the peek (must extract,
  enforce, and preserve the full downstream body byte-for-byte),
  the other covers "model" past the peek but within the cap (must
  still extract via the fallback read).

* fix(access): preserve full Gemini model path in ACL extraction

The previous extractor for /v1beta/models/<model>:<action> paths
truncated at the first '/' inside <model>. That broke deployments
using force-model-prefix, where the routed model identifier is
literally "<prefix>/<model>" (e.g. "teamA/gemini-3-pro"). The
GeminiHandler forwards the whole segment-before-":" as the model,
and IsModelAllowedForKey already strips a single leading
"<prefix>/" before glob matching, so the ACL extractor must
mirror that and forward the full identifier including embedded
slashes. Otherwise:
  - allowed requests were rejected (the ACL saw "teamA" instead
    of "teamA/gemini-3-pro")
  - policies scoped to just the prefix segment could unintentionally
    allow unrelated models sharing the prefix

Added TestModelACLMiddleware_GeminiPathPreservesPrefix which uses
the same AllowedModels pattern against both an allowed and a
disallowed model under the same prefix — a regression would
surface as a false-positive on either the allow path or the
deny path.

* fix(access): address codex review round 3 — WS bypass + drain DoS

Two P1s from the Codex reviewer on the prior commit:

1. Websocket ACL bypass
   /v1/responses accepts a websocket upgrade, and the model
   identifier is only selected later in frames consumed by the
   ResponsesWebsocket handler. extractRequestedModel returns
   found=false for the upgrade request, and the middleware was
   falling through as "no model here, allow", letting a restricted
   api key use any model through the websocket path.

   Fix: after identifying the key but before attempting body
   extraction, detect Upgrade: websocket and reject the upgrade
   when keyHasModelRestriction(cfg, apiKey) is true. A key is
   considered restricted when either it has an APIKeyPolicy entry
   with a non-empty AllowedModels list, or the deployment default
   is deny-all and the key has no matching policy. Unrestricted
   keys still pass through so the legacy websocket flow keeps
   working for them.

2. Unbounded drain after 413
   After detecting the body exceeded modelACLMaxBodyBytes, the
   code called io.Copy(io.Discard, c.Request.Body), which for a
   chunked/streamed request without a trustworthy Content-Length
   can hold the handler goroutine indefinitely — an attacker just
   keeps streaming bytes and the ACL check turns into a
   request-slot exhaustion path.

   Fix: drop the drain entirely. Return 413 and let net/http
   close the connection. No keep-alive concern since we're
   rejecting the request.

Tests added:
  - WebsocketUpgradeRejectedForRestrictedKey: key with
    AllowedModels attempting WS upgrade gets 403, handler never
    runs.
  - WebsocketUpgradeAllowedForUnrestrictedKey: key with no policy
    under allow-all default still gets through.
  - WebsocketUpgradeRejectedUnderDenyAllDefault: deny-all default
    rejects WS upgrades for keys with no matching policy.
  - OversizedBodyDoesNotDrainRemainder: uses a reader that blocks
    after its prefix to prove the middleware returns 413 promptly
    (within 2s) instead of reading to EOF.

* fix(access): keep policies consistent across duplicate key values

Two P2s from the Codex reviewer, both rooted in the same legacy
edge case: APIKeys can hold duplicate values, and the policy
helpers (renameAPIKeyPolicy, removeAPIKeyPolicy) are global —
they rewrite or drop the policy row by value, not by index. That
opened two ways for a duplicate to survive a patch/delete without
its intended restriction:

1. PatchAPIKeys old/new path
   The loop broke at the first match. If APIKeys was
   ["sk-dup", "sk-other", "sk-dup"] and the caller did
   {"old":"sk-dup","new":"sk-new"}, one slot remained "sk-dup"
   while the policy migrated to "sk-new", so the leftover
   "sk-dup" fell back to the default policy (usually allow-all).

   Fix: remove the break — rename every duplicate so nothing is
   left behind that the global policy migration would orphan.

2. PatchAPIKeys index/value path
   Similar shape: rewriting one slot while the old value still
   appears elsewhere, then moving the policy to the new value,
   left the surviving duplicate unrestricted.

   Fix: only migrate the policy when the old value no longer
   appears anywhere in APIKeys. When duplicates remain, the
   policy stays attached to the old value; the new value falls
   back to the default until the caller sets an explicit policy
   via PUT /api-keys.

3. DeleteAPIKeys indexed path
   Same pattern — removed one slot, then dropped the policy row
   unconditionally, which orphaned surviving duplicates.

   Fix: only drop the policy when no other entries still carry
   the removed value.

Added apiKeyValuePresent() helper so the three call sites share
one membership check.

Tests added:
  - PatchAPIKeys_RenameViaOldNewRenamesAllDuplicates: three-slot
    fixture with two duplicates proves all matching slots are
    renamed and the migrated policy still applies.
  - PatchAPIKeys_IndexedReplaceLeavesPolicyWhenDuplicateRemains:
    two-duplicate fixture proves the surviving slot keeps its
    policy (asserts both positively — still allows the permitted
    model — and negatively — still rejects models outside the
    allowlist).
  - DeleteAPIKeys_ByIndexKeepsPolicyWhenDuplicateRemains: same
    shape for the delete path.

* feat(auth): add refresh backoff for ineffective token updates

- Introduced `refreshIneffectiveBackoff` to prevent tight-looping in auto-refresh when token refresh fails to update expiry.
- Adjusted refresh logic to apply backoff when `shouldRefresh` evaluates true.

Closes: router-for-me#2830

* fix(executor): recover gpt-5.x text content for non-streaming requests

Codex/ChatGPT reasoning-model upstreams (the gpt-5.x family) emit assistant
text only via response.output_text.delta events; the final response.completed
event has no output item with content[].type == "output_text". The codex
executor's non-streaming path filtered out every line except response.completed
and passed it straight to the chat-completions translator, which then surfaced
a message with content == null while completion_tokens was non-zero. Clients
that read .choices[0].message.content saw the field as null and rendered
fallback text such as "Something went wrong." even though the model had
generated the requested response.

Fix: walk the SSE body once, accumulate text from response.output_text.delta
events, and synthesize a message item containing the accumulated text into
response.output[] before passing the event to the translator. The accumulator
is a no-op for non-reasoning models that already populate the message item
themselves (codexResponseHasOutputText guards against double-injection).

Tests cover codexResponseHasOutputText across the empty/missing/populated
shapes, injectCodexOutputText for fresh and pre-populated output arrays, and
a representative end-to-end gpt-5.x SSE body to verify the loop accumulates
the deltas and produces the expected post-injection event.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(codex): backfill streaming response output

* perf(codex): avoid repeated output patch writes

* feat(api): add support for `HEAD` requests to `/healthz` endpoint

- Refactored `/healthz` handler to support `HEAD` requests alongside `GET`.
- Updated tests to include validation for `HEAD` requests with expected status and empty body.

Closes: router-for-me#2929

* feat(models): add Kimi K2.6 model entry to registry JSON

* feat(models): add hardcoded GPT-Image-2 model support in Codex

- Added `GPT-Image-2` as a built-in model to avoid dependency on remote updates for Codex.
- Updated model tier functions (`CodexFree`, `CodexTeam`, etc.) to include built-in models via `WithCodexBuiltins`.
- Introduced new handlers for image generation and edit operations under `OpenAIAPIHandler`.
- Extended tests to validate 503 response for unsupported image model requests.

* fix(handlers): remove handling of unsupported `n` parameter in OpenAI image handlers

* fix(handlers): remove references to unsupported `n` parameter in OpenAI image handlers

* feat(codex): enable image generation for all Codex upstream requests

Codex CLI gates the built-in image_generation tool behind
AuthMode::Chatgpt (OAuth only). When clients connect via API key
auth through CPA, the tool is absent from requests, making image
generation unavailable through the reverse proxy.

Changes:

1. Inject image_generation tool (codex_executor.go):
   Add ensureImageGenerationTool() that appends
   {"type":"image_generation","output_format":"png"} to the tools
   array if not already present. Applied to all three execution
   paths: Execute, executeCompact, and ExecuteStream.

2. Route aliases for Codex CLI direct access (server.go):
   Add /backend-api/codex/responses routes that map to the same
   OpenAI Responses API handlers as /v1/responses. This allows
   Codex CLI to connect via chatgpt_base_url config while keeping
   AuthMode::Chatgpt, which enables the built-in image_generation
   tool on the client side.

3. Unit tests (codex_executor_imagegen_test.go):
   Cover no-tools, existing tools, already-present, empty array,
   and mixed built-in tool scenarios.

* feat(antigravity): conductor-level credits fallback for Claude models

Move credits handling from executor-level retry to conductor-level
orchestration. When all free-tier auths are exhausted (429/503), the
conductor discovers auths with available Google One AI credits and
retries with enabledCreditTypes injected via context flag.

Key changes:
- Add AntigravityCreditsHint system for tracking per-auth credits state
- Conductor tries credits fallback after all auths fail (Execute/Stream/Count)
- Executor injects enabledCreditTypes only when conductor sets context flag
- Credits fallback respects provider scope (requires antigravity in providers)
- Add context cancellation check in credits fallback to avoid wasted requests
- Remove executor-level attemptCreditsFallback and preferCredits machinery
- Restructure 429 decision logic (parse details first, keyword fallback)
- Expand shouldAbort to cover INVALID_ARGUMENT/FAILED_PRECONDITION/500+UNKNOWN
- Support human-readable retry delay parsing (e.g. "1h43m56s")

* feat: support extracting X-Amp-Thread-Id header as session id for session affinity

* fix(antigravity): remove credits fallback from CountTokens, fix gofmt

CountTokens upstream API does not support enabledCreditTypes, so
remove the dead credits fallback path from ExecuteCount and delete
the unused tryAntigravityCreditsExecuteCount method. Fix gofmt on
credits test file.

* fix: forward HTTP headers to executor Options so session affinity can read X-Amp-Thread-Id

* fix(antigravity): respect pinned auth in credits fallback, release deferred body on success

- findAllAntigravityCreditsCandidateAuths now filters by PinnedAuthMetadataKey
  to prevent credential isolation violations during credits fallback
- Release deferredBody reference on success path to avoid holding large
  payloads in memory for the lifetime of the gin context

* refactor(logging): strip unrelated deferred body changes, keep credits-only logging

Remove deferred body optimization and maxErrorLog constants that were
unrelated to credits fallback. Keep only MarkCreditsUsed/CreditsUsed
helpers for flagging requests that consumed AI credits.

* fix(auth): break credits cold-start deadlock by keeping unknown-hint auths as fallback candidates

Replace antigravityCreditsAvailableForModel with inline known/unknown
split. Auths whose credit hints are not yet populated are kept as
lower-priority candidates instead of being rejected, breaking the
chicken-and-egg deadlock at cold start.

* perf(antigravity): async credits hint refresh for warm tokens

* feat(logging): add AI API path support for image routes

- Included `/v1/images` in AI API path prefixes.
- Introduced tests to validate `/v1/images/generations` and `/v1/images/edits` as AI API paths.

* feat(models): add GPT-5.5 model entry to registry JSON

* Add GPT-5.5 Codex model support

* chore(models): remove GPT-5.5 model entry from registry JSON

* feat(codex): pass base model to enable conditional image_generation tool injection

- Modified `ensureImageGenerationTool` to accept `baseModel` for conditional logic.
- Ensured `gpt-5.3-codex-spark` models bypass image_generation tool injection.
- Updated relevant tests and executor logic to reflect changes.

* fix antigravity credits stream fallback

* fix(codex): classify known upstream failures

Normalize Codex context, thinking-signature, previous-response, and auth failures to explicit error codes: context_too_large, thinking_signature_invalid, previous_response_not_found, auth_unavailable.

Refs router-for-me#2596.

* feat(auth): disallow free-tier Codex auth during selection process

- Introduced `disallowFreeAuthFromMetadata` and `isFreeCodexAuth` to enforce skipping free-tier credentials.
- Modified scheduler logic to honor `DisallowFreeAuthMetadataKey` during auth selection.
- Updated `ensureImageGenerationTool` to skip tool injection for free-tier Codex auth.
- Added context utility `WithDisallowFreeAuth` and integrated with image handlers.
- Augmented relevant tests to cover free-tier exclusion scenarios.

* Add CPA Usage Keeper to README ecosystem list

* docs:Add CPA Usage Keeper to README ecosystem list

* feat(api): implement protocol multiplexer and Redis queue for usage integration

- Added `protocol_multiplexer.go`, enabling support for both HTTP and Redis protocols on a single listener.
- Introduced `redis_queue_protocol.go` to handle Redis-compatible RESP commands for queue management.
- Integrated `redisqueue` package, supporting in-memory queuing with expiration pruning.
- Updated server initialization to manage a shared listener and multiplex connections.
- Adjusted `Handler` to adopt `AuthenticateManagementKey` for modular key validation, supporting both HTTP and Redis flows.

* feat(security): implement IP ban for repeated management key and Redis AUTH failures

- Added IP ban logic to `AuthenticateManagementKey` and Redis protocol handlers, blocking requests after multiple failed attempts.
- Introduced unit tests to validate IP ban behavior across localhost and remote clients.
- Synchronized Redis protocol's authentication policy with management key validation.

* feat(models): add Codex Auto Review model entry to registry JSON

Closes: router-for-me#2995

* feat(api): enhance model assignment logic in image handlers

- Updated `buildImagesResponsesRequest` to derive `model` dynamically based on `toolJSON`.
- Adjusted streaming execution to handle dynamic model resolution across multiple contexts.

Closes: router-for-me#2965

* fix(test): remove free tier from GPT-5.5 inclusion test

GPT-5.5 was correctly removed from codex-free tier in 7b89583
(since free accounts cannot access it), but the test was not updated
to reflect this. This caused TestCodexStaticModelsIncludeGPT55 to
fail on the free subtest.

Changes:
- Remove free tier from GPT-5.5 inclusion test
- Add new TestCodexFreeModelsExcludeGPT55 to explicitly verify
  that free tier does NOT include GPT-5.5

* feat(config): add support for disabling OpenAI compatibility providers

- Introduced a `Disabled` flag to OpenAI compatibility configurations.
- Updated routing, auth selection, and API handling logic to respect the `Disabled` state.
- Extended relevant APIs, YAML configurations, and data structures to include the `Disabled` field.
- Adjusted all relevant loops and filters to skip disabled providers.

Closes: router-for-me#3060 router-for-me#3059 router-for-me#2977

* feat(executor): add support for Codex image generation tool usage tracking

- Introduced `publishCodexImageToolUsage` to report image generation tool metrics.
- Updated executor logic to handle image generation tool events and defaults.
- Added parsing logic for `image_gen` tool usage details in `helps/usage_helpers.go`.
- Updated `UsageReporter` for additional model-specific usage publishing.
- Refactored usage detail normalizations.

Closes: router-for-me#3063

* fix(usage_helpers): skip zero-token usage in additional model records

- Added `buildAdditionalModelRecord` to filter out zero-token usage details.
- Introduced `hasNonZeroTokenUsage` helper function for token usage validation.
- Updated tests to cover scenarios for zero and non-zero token usage.

* feat(codex): handle thinking-signature conversion for reasoning content

- Implemented `appendReasoningContent` to support processing of `thinking` signature and text as reasoning input.
- Added test cases to validate reasoning content conversion with and without text.

* fix(codex): include `content` field in reasoning item initialization

* Preserve Codex reasoning signatures for Claude

* fix(antigravity): use real antigravity UA when polling credits balance

The loadCodeAssist polling call hardcoded the User-Agent to
google-api-nodejs-client/9.15.1. Google Cloud Code returns the
paidTier object WITHOUT the availableCredits array for that UA,
so updateAntigravityCreditsBalance always saw "no credits", set the
hint to Available=false for every Google One AI Ultra account, and
the conductor-level credits fallback could never find a candidate.

Switching to resolveUserAgent(auth) (the same UA used for
streamGenerateContent / generateContent) makes the response include
availableCredits, so the credits hint is populated correctly and the
fallback can actually inject enabledCreditTypes:["GOOGLE_ONE_AI"]
when free tier is exhausted.

* test(api): add validation for unsupported models in OpenAI image handlers

- Introduced tests to ensure unsupported models are rejected in `/images/generations` and `/images/edits`.
- Added `isSupportedImagesModel` and `rejectUnsupportedImagesModel` functions for consistent model validation.
- Enhanced image handler logic to apply validation checks for model compatibility.

* fix antigravity user agent handling

* fix antigravity client agent headers

* fix(access): address luispater review — alias bypass + multipart gating

Three blockers + two nits from luispater's CHANGES_REQUESTED review on
PR router-for-me#2914:

server.go (BLOCKING — alias bypass):
- /backend-api/codex/* uses AuthMiddleware but had no ModelACLMiddleware,
  so a key restricted by api-key-policies could bypass the allowlist by
  routing through the codex alias instead of /v1/responses. Added
  ModelACLMiddleware to the codexDirect group.

model_acl.go (BLOCKING — non-JSON body buffering):
- extractRequestedModel previously buffered any POST/PUT/PATCH body up
  to 10 MiB to look for a JSON "model" field. With ACLs enabled this
  also buffered non-JSON uploads (image edits, future binary endpoints)
  even when no JSON model field could exist.
- Dispatch on Content-Type:
    - application/json (and *+json, missing CT) → existing peek+buffer
      path with the 10 MiB cap.
    - multipart/form-data → ParseMultipartForm with a 1 MiB in-memory
      cap; large file parts spool to temp disk per stdlib. This recovers
      enforcement on /v1/images/edits without ballooning resident memory.
    - everything else → no body inspection, allow through.
- ParseMultipartForm is idempotent, so downstream handlers using
  c.MultipartForm() / c.PostForm() still see the parsed form.

config (non-blocking nit — response shape):
- APIKeyPolicy.AllowedModels JSON tag: allowed-models → allowedModels
  to match what the dashboard sends on PUT. Input parser still accepts
  all three case styles (camel, kebab, snake) for backward compat.

tests (non-blocking nit + new coverage):
- api_keys_payload_test.go: rename "empty plain array fails" →
  "empty plain array clears keys" (the case asserts wantOK:true).
- model_acl_test.go: 8 new tests:
    - CodexAliasDisallowedModelRejected / AllowedModelPasses
    - MultipartFormDataDisallowedModelRejected / AllowedModelPasses
    - MultipartWithoutModelFieldAllowed
    - NonJSONContentTypeAllowedThrough (10 MiB octet-stream → 200,
      proves the middleware no longer buffers non-JSON bodies)
    - JSONWithCharsetParameterStillInspected
    - VendorJSONContentTypeInspected (application/vnd.api+json)

go test ./internal/api/... ./internal/config/... is green; the only
failing test on this branch (TestCodexStaticModelsIncludeGPT55) also
fails on pristine main and is unrelated.

* fix(access): filter model listings by API key policy

* Fix TUI hourly usage fallback

---------

Co-authored-by: wykk-12138 <z710959836@gmail.com>
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Co-authored-by: Luis Pater <webmaster@idotorg.org>
Co-authored-by: sususu98 <suchangshan@foxmail.com>
Co-authored-by: ZTXBOSS666 <150424052+ZTXBOSS666@users.noreply.github.com>
Co-authored-by: Chén Mù <10558748+hkfires@users.noreply.github.com>
Co-authored-by: Allen Yi <yiwenyou_allen@outlook.com>
Co-authored-by: sususu98 <33882693+sususu98@users.noreply.github.com>
Co-authored-by: muzhi1991 <2101044+muzhi1991@users.noreply.github.com>
Co-authored-by: Supra4E8C <tan13318527080@163.com>
Co-authored-by: Supra4E8C <69194597+LTbinglingfeng@users.noreply.github.com>
Co-authored-by: octo-patch <octo-patch@users.noreply.github.com>
Co-authored-by: Lyra <lyra@tuckerbradford.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: stringer07 <1742292793@qq.com>
Co-authored-by: MoYeRanQianZhi <MoYeRanQianZhi@gmail.com>
Co-authored-by: XYenon <i@xyenon.bid>
Co-authored-by: Ben Vargas <ben@vargas.com>
Co-authored-by: Matthias319 <Matthias319@users.noreply.github.com>
Co-authored-by: Will <wangxking@outlook.com>
Co-authored-by: philipbankier <philipbankier@gmail.com>
Co-authored-by: xbang <xbang0222@gmail.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e0e8d1f90b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/api/model_acl.go
Comment thread internal/api/server.go
@mtuckerb
Copy link
Copy Markdown
Author

mtuckerb commented May 3, 2026

ready for another review!

@mtuckerb
Copy link
Copy Markdown
Author

mtuckerb commented May 6, 2026

@luispater anything else you need from me on this ?

@mtuckerb
Copy link
Copy Markdown
Author

Quick status update: I re-checked the latest inline review state and found three newer codex comments after my prior red-team note that I hadn't accounted for yet:

  • image endpoints can default to gpt-image-2 when model is omitted, so the ACL needs to treat that implicit default as the requested model;
  • malformed multipart/form-data currently fail-opens in extraction and needs to reject/fail safe;
  • Amp provider aliases under /api/provider/... need the same model ACL middleware as /v1, /v1beta, and /backend-api/codex.

I'm treating those as the remaining blockers and will push another fix/test pass before asking for re-review again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants