Bug: 总供应商数: 0 个的时候一直等到超时结束

响应时间过长：

0000ms] 会话复用选择供应商

Session ID: 55ecfbe2-495e-4e6f-8a36-673ead0f212c
复用供应商: tiger
供应商配置: 优先级=0 权重=5 成本倍数=1
基于会话缓存（5分钟内）

等待请求结果...

[0000ms] 首次选择供应商

系统状态:
总供应商数: 0 个
已启用 claude 类型: 0 个
健康检查通过: 0 个

✓ 选择: tiger

等待请求结果...

[128494ms] 请求失败（第 1 次尝试）

供应商: tiger
状态码: 524
错误: Provider returned 524: 
请求耗时: 128494ms

熔断状态:
当前状态: 关闭（正常）
失败次数: 1/5
距离熔断还有 4 次

错误详情:
<!DOCTYPE html>



 <html class="no-js" lang="en-US"> 
<head>

<title>bookapi.cc | 524: A timeout occurred</title>
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compati...
请求详情（用于排查问题）:
请求方法: POST
请求 URL: http://0.0.0.0:3000/v1/messages?beta=true
请求头:
  accept: application/json
  accept-encoding: gzip, deflate, br, zstd
  anthropic-beta: claude-code-20250219,context-1m-2025-08-07,interleaved-thinking-2025-05-14,context-management-2025-06-27,prompt-caching-scope-2026-01-05,effort-2025-11-24
  anthropic-dangerous-direct-browser-access: true
  anthropic-version: 2023-06-01
  authorization: Bearer sk-2******73bd
  content-length: 1182939
  content-type: application/json; charset=utf-8
  host: 172.22.100.77:23000
  noauth: true
  traceparent: 00-4cf30b4aa838b639e819f7f910da868e-f2c531a3a3a9429b-00
  user-agent: claude-cli/2.1.94 (external, claude-vscode, agent-sdk/0.2.94)
  x-accel-buffering: no
  x-api-key: sk-2******73bd
  x-app: cli
  x-b3-parentspanid: 85e39e73e7e37a38
  x-b3-sampled: 0
  x-b3-spanid: 21531a767578cc5d
  x-b3-traceid: 9ed0df3a029933f285e39e73e7e37a38
  x-claude-code-session-id: 55ecfbe2-495e-4e6f-8a36-673ead0f212c
  x-correlation-id: 1dc5a8adc33d41d38c9ee106eb279118
  x-envoy-attempt-count: 1
  x-envoy-external-address: 100.97.125.0
  x-forwarded-client-cert: By=spiffe://cluster.local/ns/flex-uf/sa/default;Hash=8db814c34eea873c8e5709e2bc04171fde1e1673f40affb06a0c7f4c97d8a390;Subject="";URI=spiffe://cluster.local/ns/istio-system/sa/istio-ingressgateway-service-account
  x-forwarded-for: 100.97.125.0
  x-forwarded-host: 172.22.100.77:23000
  x-forwarded-port: 3000
  x-forwarded-proto: http
  x-real-ip: 122.190.56.9
  x-request-id: b7328309-07f2-45d2-9bfb-131566e04434
  x-stainless-arch: x64
  x-stainless-lang: js
  x-stainless-os: Windows
  x-stainless-package-version: 0.81.0
  x-stainless-retry-count: 0
  x-stainless-runtime: node
  x-stainless-runtime-version: v24.3.0
  x-stainless-timeout: 300
请求体 （已截断）:
  {
    "model": "claude-opus-4-6",
    "messages": [
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      ...


[255115ms] 供应商类型全端点超时（524）

供应商: tiger
状态码: 524
错误: Provider returned 524: 
请求耗时: 126621ms

错误详情:
<!DOCTYPE html>



 <html class="no-js" lang="en-US"> 
<head>

<title>bookapi.cc | 524: A timeout occurred</title>
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compati...
请求详情（用于排查问题）:
请求方法: POST
请求 URL: http://0.0.0.0:3000/v1/messages?beta=true
请求头:
  accept: application/json
  accept-encoding: gzip, deflate, br, zstd
  anthropic-beta: claude-code-20250219,context-1m-2025-08-07,interleaved-thinking-2025-05-14,context-management-2025-06-27,prompt-caching-scope-2026-01-05,effort-2025-11-24
  anthropic-dangerous-direct-browser-access: true
  anthropic-version: 2023-06-01
  authorization: Bearer sk-2******73bd
  content-length: 1182939
  content-type: application/json; charset=utf-8
  host: 172.22.100.77:23000
  noauth: true
  traceparent: 00-4cf30b4aa838b639e819f7f910da868e-f2c531a3a3a9429b-00
  user-agent: claude-cli/2.1.94 (external, claude-vscode, agent-sdk/0.2.94)
  x-accel-buffering: no
  x-api-key: sk-2******73bd
  x-app: cli
  x-b3-parentspanid: 85e39e73e7e37a38
  x-b3-sampled: 0
  x-b3-spanid: 21531a767578cc5d
  x-b3-traceid: 9ed0df3a029933f285e39e73e7e37a38
  x-claude-code-session-id: 55ecfbe2-495e-4e6f-8a36-673ead0f212c
  x-correlation-id: 1dc5a8adc33d41d38c9ee106eb279118
  x-envoy-attempt-count: 1
  x-envoy-external-address: 100.97.125.0
  x-forwarded-client-cert: By=spiffe://cluster.local/ns/flex-uf/sa/default;Hash=8db814c34eea873c8e5709e2bc04171fde1e1673f40affb06a0c7f4c97d8a390;Subject="";URI=spiffe://cluster.local/ns/istio-system/sa/istio-ingressgateway-service-account
  x-forwarded-for: 100.97.125.0
  x-forwarded-host: 172.22.100.77:23000
  x-forwarded-port: 3000
  x-forwarded-proto: http
  x-real-ip: 122.190.56.9
  x-request-id: b7328309-07f2-45d2-9bfb-131566e04434
  x-stainless-arch: x64
  x-stainless-lang: js
  x-stainless-os: Windows
  x-stainless-package-version: 0.81.0
  x-stainless-retry-count: 0
  x-stainless-runtime: node
  x-stainless-runtime-version: v24.3.0
  x-stainless-timeout: 300
请求体 （已截断）:
  {
    "model": "claude-opus-4-6",
    "messages": [
      0,
     

该供应商类型的所有端点均超时，已触发供应商类型临时熔断。

[258740ms] cubence (client_abort)

==========================================================

claude code分析：


  ---
  根因分析

  你设置了 30s 流式超时（firstByteTimeoutStreamingMs = 30000），但实际等了 128 秒才超时。这是两条独立代码路径的问题：

  路径分叉点（forwarder.ts:741）

  send()
    ├─ shouldUseStreamingHedge() == true  → sendStreamingWithHedge()  ← 用 thresholdTimer
    └─ shouldUseStreamingHedge() == false → 普通重试循环                ← 用 responseTimeout

  shouldUseStreamingHedge 的触发条件（forwarder.ts:2951-2958）

  (endpointPolicy?.allowRetry ?? true) &&
  (endpointPolicy?.allowProviderSwitch ?? true) &&
  stream === true &&
  firstByteTimeoutStreamingMs > 0   // ← 你设置了 30s，所以走 hedge 路径

  Hedge 路径里的 firstByteTimeoutMs 是干什么的（forwarder.ts:3117-3118）

  const providerForRequest =
    attempt.firstByteTimeoutMs > 0
      ? { ...attempt.provider, firstByteTimeoutStreamingMs: 0 }  // ← 传给 doForward 时清零！
      : attempt.provider;

  Hedge 路径把 firstByteTimeoutStreamingMs 清零再传给 doForward，这样 doForward 内部的 responseTimeoutMs = 0，即禁用了 abort 超时（forwarder.ts:2334）。

  Hedge 路径自己用 thresholdTimer（armAttemptThreshold）来触发"切换备用供应商"，而不是中止当前请求。当没有其他可用供应商时（日志显示：总供应商数: 0，只有 tiger 一个），launchAlternative() 找不到备选，于是
  当前的 tiger 请求就不受任何超时约束地一直跑下去，直到上游 Cloudflare 的 524（120s）。

  为什么没有备选供应商？

  日志里明确写了：
  总供应商数: 0 个
  已启用 claude 类型: 0 个
  只有一个 tiger，launchAlternative 返回 null，Hedge 机制退化为"什么都不做"，但此时 doForward 的 abort 已经被清零了，所以 30s 超时完全失效。

  本质上是一个 bug

  Hedge 路径的设计意图是：30s 内没有首字节 → 触发切换到备用供应商。但当没有备用供应商时，应该直接 abort 当前请求，而不是让它无限等待。当前代码在 launchAlternative 返回 null 后只是 return，没有 abort 掉
  in-flight 的请求。

  修复思路

  在 launchAlternative() 确认没有备选供应商之后，如果当前还有 in-flight 请求，应该根据 firstByteTimeoutMs 给它们设一个兜底 abort（或者干脆在 shouldUseStreamingHedge 里加个"至少有 2 个供应商"的前置条件）。

  最简单的修复在 launchAlternative（forwarder.ts:3080-3086）：

  if (!alternativeProvider) {
    noMoreProviders = true;
    if (attempts.size === 0) {
      await finishIfExhausted();
    } else {
      // 没有备选时，给所有 in-flight 请求补上首字节超时兜底 abort
      for (const attempt of Array.from(attempts)) {
        if (attempt.firstByteTimeoutMs > 0 && !attempt.settled) {
          attempt.responseController?.abort();
        }
      }
    }
    return;
  }

  不过更彻底的修复是在 runAttempt 里：当 noMoreProviders 确认后，如果 thresholdTriggered 但没有 winner，应把当前请求的 abort 恢复。你可以根据实际架构选择合适的修复层次。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: 总供应商数: 0 个的时候一直等到超时结束 #1016

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Bug: 总供应商数: 0 个的时候一直等到超时结束 #1016

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions