Skip to content

feat(gateway): add downstream keepalive for non-stream compact responses#2976

Open
bud-primordium wants to merge 1 commit into
Wei-Shaw:mainfrom
bud-primordium:feat/compact-nonstream-keepalive
Open

feat(gateway): add downstream keepalive for non-stream compact responses#2976
bud-primordium wants to merge 1 commit into
Wei-Shaw:mainfrom
bud-primordium:feat/compact-nonstream-keepalive

Conversation

@bud-primordium
Copy link
Copy Markdown

背景

通过 Cloudflare Tunnel、Nginx 等反向代理转发 /responses/compact 请求时,上游 compact 在模型处理期间可能长时间不返回任何字节。当静默时间超过反代的空闲/读超时(如 Cloudflare Free 约 100-120 秒),反代主动断开连接,Sub2API 写响应时出现 broken pipe,客户端报 stream disconnected before completion

warn  http request contains gin errors  status=200  latency_ms=92841
      method=POST  path=/responses/compact
      errors=Error #01: write tcp ...: write: broken pipe

当前 compact passthrough 通过 normalizeOpenAIPassthroughOAuthBody 强制 stream=false。非流式路径在拿到 upstream response 后仍会先读取完整 body(ReadUpstreamResponseBody),随后才 c.Data() 一次性写给下游。整个过程中下游可能长期零字节传输。

流式路径已有 stream_keepalive_interval(SSE comment 心跳),但 compact 被强制非流式无法受益。#2243 曾尝试将 compact 改为 SSE 流式,但验证发现上游 compact endpoint 在处理期间不发 SSE event,且 TCP keepalive 无法解决应用层空闲超时,方案关闭。

本 PR 采用不同机制:不依赖上游发数据,Sub2API 主动向下游写空行心跳。实现方式参考了 CLIProxyAPI 的 nonstream-keepalive-intervalStartNonStreamingKeepAlive)。

改动

新增配置 gateway.openai_compact_nonstream_keepalive_interval(秒),默认 0 禁用,启用时范围 5-60。

启用后,对 /responses/compact 非流式 passthrough 请求:

  1. forwardOpenAIPassthrough() 完成 body normalize、本地 policy 校验、access token 获取和 upstream request 构造后,httpUpstream.Do() 前启动 goroutine,按配置间隔向下游写 \nFlush()
  2. keepalive 覆盖整个 upstream 等待和 body 读取/转换阶段
  3. 最终响应准备好后,在 handleNonStreamingResponsePassthroughhandlePassthroughSSEToJSON 写下游前停止心跳,写出 JSON body

JSON 标准允许前导空白,测试已验证 json.Valid(bytes.TrimSpace(body)) 成立。

Trade-off

一旦心跳写出第一个 \n,HTTP status 会被提交为 200。此后若上游返回 >=400,无法再向客户端传递正确状态码,也不再触发 429/529 failover,只能直接代理错误 body。此时会记录 compact_keepalive_committed=true 诊断日志。

心跳未写出时保持现有 failover 和错误处理逻辑不变。功能默认禁用,仅在部署方明确需要绕过反代空闲超时时启用。

配置

gateway:
  # 0=禁用(默认);非 0 时必须为 5-60
  openai_compact_nonstream_keepalive_interval: 15

环境变量:GATEWAY_OPENAI_COMPACT_NONSTREAM_KEEPALIVE_INTERVAL=15

测试

  • 配置默认值和范围校验
  • compact keepalive 成功:模拟上游延迟,验证响应以 \n 开头且 json.Valid(TrimSpace(body))
  • 配置关闭时无前导空白
  • 上游在首个 tick 前返回 400:保持原有错误处理
  • 心跳已提交后上游返回 400/429:不 failover,代理错误 body
cd backend && go test -tags=unit ./internal/config/... ./internal/service/... -run 'Keepalive|Compact' -v

生产验证

已在自建 Cloudflare Tunnel (Free) + VPS 环境部署验证,keepalive 间隔 15 秒,Codex CLI compact 请求不再出现 broken pipe / stream disconnected

Related

While waiting for upstream compact responses, periodically write blank
lines to the downstream connection and flush, preventing reverse proxies
(e.g. Cloudflare Tunnel, Nginx) from closing the connection due to idle
timeout.

New config: gateway.openai_compact_nonstream_keepalive_interval (seconds)
Default 0 (disabled); valid range 5-60 when enabled.

Refs Wei-Shaw#2773, Wei-Shaw#2243
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

All contributors have signed the CLA. ✅
Posted by the CLA Assistant Lite bot.

@bud-primordium
Copy link
Copy Markdown
Author

I have read the CLA Document and I hereby sign the CLA

github-actions Bot added a commit that referenced this pull request Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant