Skip to content

fix(openai→claude): suppress empty/duplicate tool_use content_block_start#2926

Open
slicenferqin wants to merge 8 commits into
router-for-me:devfrom
slicenferqin:fix-tool-use-name-loss-and-duplicates
Open

fix(openai→claude): suppress empty/duplicate tool_use content_block_start#2926
slicenferqin wants to merge 8 commits into
router-for-me:devfrom
slicenferqin:fix-tool-use-name-loss-and-duplicates

Conversation

@slicenferqin
Copy link
Copy Markdown

Problem

When internal/translator/openai/claude/openai_claude_response.go translates streaming OpenAI responses into Anthropic format, two issues in convertOpenAIStreamingChunkToAnthropic can corrupt tool_use blocks:

  1. Empty name passes through. gjson.Result.Exists() returns true for "name": "" and "name": null. The empty string flows into accumulator.Name and the emitted content_block_start. Anthropic-format clients then reject the block — Amp CLI's normalizer logs Skipping tool_use normalization due to missing name and the tool isn't surfaced.

  2. Duplicate content_block_start emissions. Some upstream providers (or proxied stacks like dmxapi.cn) send the function.name field redundantly across multiple streaming chunks for the same tool index. The original code emits a fresh content_block_start every time, producing two start events for one logical tool block. This breaks parsers that expect one start per index — Amp surfaces it as JSON Parse error: Unable to parse JSON string and the assistant turn is lost.

Reproduction

Connect Amp CLI through CLIProxyAPI's openai-compatibility provider to an upstream that occasionally re-sends function.name (Chinese relay stations like dmxapi.cn do). Run any task that triggers concurrent tool calls:

amp -x \"explore this repository's src/ structure\"

Without this patch, ~10–20% of repo-exploration runs fail with JSON Parse error. Logs show 11+ Skipping tool_use normalization due to missing name warnings per failed run.

Fix

  • Reject empty-string names before assigning to accumulator.Name.
  • Track per-tool emission state with a new ToolCallAccumulator.StartEmitted flag and only emit content_block_start once per tool index, after a non-empty name has been observed.

Verification

Scenario Before After
Reliability (3× per Amp mode: smart/rush/deep/large) smart/rush/large 0–1/3 pass 12/12 pass
Repo-exploration (heavy concurrent Read/Grep) JSON Parse error, output lost Completes cleanly
Skipping tool_use normalization warnings per run 11+ 0–3 (all benign, run still succeeds)

Notes

  • The patch is minimal (16 insertions, 1 deletion) and behaviorally backward-compatible for upstreams that already emit a single non-empty name per tool.
  • Related: this affects any client that strictly validates Anthropic content_block_start events.

…tart

When translating OpenAI streaming responses to Anthropic format,
`convertOpenAIStreamingChunkToAnthropic` emits a `content_block_start`
event with `name` whenever a chunk's `tool_calls[].function.name` field
exists. Two failure modes:

1. **Empty name passes through.** `gjson.Result.Exists()` returns true
   for `"name": ""` and `"name": null`. The empty string flows into
   `accumulator.Name` and into the emitted `content_block_start`, which
   makes downstream Anthropic-format clients (Amp CLI's tool_use
   normalizer in particular) skip the block as malformed:
   "Skipping tool_use normalization due to missing name".

2. **Duplicate emissions.** Some upstream providers send the
   `function.name` field redundantly in multiple streaming chunks for
   the same tool index. The original code emitted a fresh
   `content_block_start` every time, producing two start events for
   one logical tool block. This breaks parsers that expect one start
   per index and surfaces in Amp as a JSON Parse error.

Fix:
- Reject empty-string names before assigning to `accumulator.Name`.
- Track per-tool emission state with `ToolCallAccumulator.StartEmitted`
  and only emit `content_block_start` once per tool index, after a
  non-empty name has been observed.

Reproduction: connect Amp CLI through CLIProxyAPI's openai-compatibility
provider and run any task that triggers concurrent tool calls (e.g.,
"explore this repo" → multiple parallel Read/Grep). Without the patch,
~10–20% of runs fail with "JSON Parse error: Unable to parse JSON string"
and the assistant turn is lost. With the patch: 12/12 reliability runs
pass and a real repo-exploration task completes cleanly.
@github-actions
Copy link
Copy Markdown

This pull request targeted main.

The base branch has been automatically changed to dev.

@github-actions github-actions Bot changed the base branch from main to dev April 20, 2026 10:24
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a StartEmitted flag to the ToolCallAccumulator to prevent duplicate content_block_start events during OpenAI-to-Anthropic streaming translation. The review feedback highlights the need to ensure cleanup loops also check this flag to avoid protocol violations, suggests optimizing the tool name assignment logic, and warns about potential non-sequential block indices if tool calls are suppressed.

// streaming chunks (or repeat it); without this guard we emit duplicate
// content_block_start events, which crashes Anthropic-format clients
// (e.g. Amp's tool_use normalizer).
StartEmitted bool
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The introduction of StartEmitted is a correct approach to prevent duplicate content_block_start events. However, this change allows param.ToolCallsAccumulator to contain entries for tool calls that were never successfully started (e.g., if the stream ends before a non-empty name is received).

Ensure that the cleanup loops (at lines 307 and 371 in the full file) are updated to check accumulator.StartEmitted before emitting content_block_delta or content_block_stop events. Sending a stop event for a block index that was never started is a violation of the Anthropic streaming protocol and can cause client-side errors.

Comment on lines +251 to +253
if name := function.Get("name"); name.Exists() && name.String() != "" {
accumulator.Name = util.MapToolName(param.ToolNameMap, name.String())
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The tool name is currently updated in every chunk that contains a non-empty name, even after the content_block_start event has been emitted. This is redundant. Wrapping this logic in a check for !accumulator.StartEmitted improves efficiency and ensures the name remains consistent with what was originally emitted.

if !accumulator.StartEmitted {
					if name := function.Get("name"); name.Exists() && name.String() != "" {
						accumulator.Name = util.MapToolName(param.ToolNameMap, name.String())
					}
				}

accumulator.Name = util.MapToolName(param.ToolNameMap, name.String())
}

if !accumulator.StartEmitted && accumulator.Name != "" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

By suppressing the emission of content_block_start for tool calls with empty names, this logic may create gaps in the Anthropic content block indices. Since toolContentBlockIndex is called at line 229 (outside the diff) for every tool call seen, an index is reserved even if the block is never emitted. Anthropic expects sequential indices (0, 1, 2...). If a tool call is suppressed, the next block (e.g., a text block) will have an index that skips the suppressed one, which may cause protocol errors in strict clients. Consider refactoring the index assignment to occur only when a block is actually emitted.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bec7c56145

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

accumulator.Name = util.MapToolName(param.ToolNameMap, name.String())
}

if !accumulator.StartEmitted && accumulator.Name != "" {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Skip tool final events if start was never emitted

By gating content_block_start on name.String() != "", this path now leaves StartEmitted false for tool calls that never receive a non-empty name, but the finish handlers still unconditionally emit content_block_delta/content_block_stop for every accumulator entry. In that scenario the stream contains delta/stop events for a block that was never started, which violates Anthropic event ordering and can cause strict clients to fail parsing the assistant turn instead of merely ignoring that tool call.

Useful? React with 👍 / 👎.

…ted tool blocks

Address review feedback on PR router-for-me#2926:

- Skip content_block_delta/stop emission in both cleanup loops
  (streaming finish_reason path and [DONE] path) when StartEmitted
  is false. Without this, tool calls whose start was suppressed
  (empty name throughout the stream) still got delta/stop events,
  violating Anthropic event ordering.

- Defer toolContentBlockIndex assignment until the moment we emit
  content_block_start, so suppressed tool calls don't burn a
  sequential index and leave gaps that strict clients reject.

- Gate name assignment on !StartEmitted so a repeated name field
  on later chunks can't drift from what was already announced.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4bab86cdd0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 249 to 255
if name := function.Get("name"); name.Exists() && name.String() != "" {
accumulator.Name = util.MapToolName(param.ToolNameMap, name.String())
}
}

if !accumulator.StartEmitted && accumulator.Name != "" {
stopThinkingContentBlock(param, &results)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Downgrade stop reason when all tool starts are suppressed

This change suppresses content_block_start when function.name is empty, but the stream still marks SawToolCall for every tool_calls delta and later maps the final stop reason to tool_use. In the exact malformed-stream case this patch handles (all tool names empty/null), the response now ends with stop_reason: tool_use while emitting no tool_use content blocks at all, which can break Anthropic-compatible clients that require at least one tool block before executing tools. Please derive stop reason from emitted tool blocks (e.g., any StartEmitted) instead of raw tool_calls presence.

Useful? React with 👍 / 👎.

@slicenferqin
Copy link
Copy Markdown
Author

Hi @luispater — heads up that I just pushed 4bab86c addressing the three review points from gemini-code-assist and chatgpt-codex-connector (one P1 shared by both bots, plus two smaller P2/P3 follow-ups). That commit tripped translator-path-guard, which I now see is designed to block PRs from touching internal/translator/** and route changes through an issue instead.

The original PR (commit bec7c56) also touches the same file, so I think the whole change falls under the same policy. A few options from my side — happy to take whichever you prefer:

  1. Leave the PR as-is and you/the maintenance team override the guard if the fix is useful.
  2. I revert 4bab86c and we keep just the original patch.
  3. I close this PR and open an issue describing the bug + proposed fix, and the maintenance team owns the implementation.

Brief recap for context:

  • Original fix: suppress empty/duplicate content_block_start emissions in convertOpenAIStreamingChunkToAnthropic — reproducible against Amp CLI + dmxapi.cn relays, details in the PR description.
  • Follow-up commit (4bab86c): gate the cleanup loops (both streaming finish_reason and [DONE] paths) on StartEmitted so delta/stop aren't emitted for blocks whose start was suppressed; also defer toolContentBlockIndex reservation to emit time so suppressed tool calls don't leave gaps.

Let me know how you'd like to proceed. Thanks!

Copy link
Copy Markdown
Collaborator

@luispater luispater left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This change hardens OpenAI→Anthropic streaming translation for tool_use blocks by (1) suppressing empty function.name values, (2) preventing duplicate content_block_start emissions for the same tool index, and (3) ensuring we never emit delta/stop events (or reserve block indexes) for tool blocks that never successfully started.

What changed

  • Added ToolCallAccumulator.StartEmitted to track whether a tool block has already emitted content_block_start.
  • Only records a non-empty function.name prior to start emission; repeated name fields after start cannot drift the announced name.
  • Defers toolContentBlockIndex() reservation until the moment content_block_start is emitted, preventing index gaps for suppressed tool calls.
  • Skips content_block_delta/content_block_stop emission in both cleanup paths when StartEmitted is false, preserving Anthropic event ordering.

Key findings

  • The approach is minimal and aligns with Anthropic streaming expectations (exactly one content_block_start per tool block, and no stop without start).
  • Deferring index allocation until emit-time is a good fit for “name never materializes” scenarios and avoids strict-client issues with sparse indexes.

Suggested follow-ups (non-blocking)

  • Consider requiring a non-empty tool call id before emitting content_block_start, similar to the name guard, to avoid ever announcing an empty content_block.id if an upstream sends fields out-of-order.
  • Consider tightening the name check to ensure name is actually a JSON string type before mapping/using it.

Test plan

  • CI: pr-test-build job reports success.
  • Recommended additional coverage (if feasible): a small unit test matrix for streaming chunks covering name:"", name:null, repeated name across chunks, and multiple tool indexes to assert no duplicate starts and no stop-without-start behavior.

This is an automated Codex review result and still requires manual verification by a human reviewer.

…aming tests

Address maintainer follow-up review on PR router-for-me#2926:

- Require accumulator.ID != "" before emitting content_block_start.
  Without this, an upstream that sends `name` first and `id` later
  would cause SanitizeClaudeToolID's empty-input fallback path to
  invent a synthetic `toolu_<nanos>_<n>` id and announce it on the
  wire, even though the real id arrives in a later chunk.

- Tighten the name guard to require gjson.String type, so a
  malformed upstream sending `"name": 123` or other non-string
  values no longer silently coerces into the announced tool name.

- Tighten the id accumulator the same way (string type, non-empty
  value) for symmetry and to avoid stomping a previously-set id
  with a later "id":"" chunk.

- Add openai_claude_response_test.go covering: empty name throughout,
  null name, non-string name, repeated name across chunks, mixed
  suppressed-and-valid tool indexes (no index gap), and id-arrives-late
  (start deferred until id is present, no fallback id announced).
@slicenferqin
Copy link
Copy Markdown
Author

Hi @luispater — thanks for the thorough review! Just pushed 4a90804 addressing all three follow-ups from your Apr 25 comment:

1. Non-empty id guard before content_block_start (your first nit)
Added accumulator.ID != "" to the emit gate alongside the existing name guard. Previously, if an upstream sent name first and id later, SanitizeClaudeToolID's empty-input fallback would invent a synthetic toolu_<nanos>_<n> id and announce it, even though the real id was still on the wire. Now the start is deferred until both fields are present.

2. Name as JSON-string type only (your second nit)
Tightened the name guard to name.Type == gjson.String so non-string values ("name": 123, etc.) no longer silently coerce via gjson.String() into the announced tool name. Applied the same type check to the id accumulator for symmetry, and made it skip empty-string updates so a later "id":"" chunk can't stomp a previously-set value.

3. Unit test matrix (your test plan)
Added openai_claude_response_test.go covering:

  • name:"" throughout — no start, no delta, no stop
  • name:null — same behavior
  • name: non-string (number) — rejected by type guard
  • repeated name across chunks — exactly one start, no name drift
  • mixed suppressed + valid tool indexes — no index gap (announced indices stay sequential from 0)
  • id arrives in a later chunk — start deferred until id is present, no fallback id announced

All six pass locally; go test ./internal/translator/openai/claude/... is green.

Heads up: this commit will trip translator-path-guard for the same reason bec7c56 did — happy to revert any subset if you'd prefer the original PR scope, or to close this and hand off via an issue if the path policy should hold here.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4a9080438e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 247 to +249
if function := toolCall.Get("function"); function.Exists() {
if name := function.Get("name"); name.Exists() {
accumulator.Name = util.MapToolName(param.ToolNameMap, name.String())
// Only record the name until content_block_start has been
// emitted. Upstreams sometimes send "name":"" or repeat the
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Trigger tool_use start when id arrives in non-function delta

convertOpenAIStreamingChunkToAnthropic now requires both name and id before emitting content_block_start, but that emission check is nested under if function.Exists(). If an upstream sends function.name first and later sends id in a delta without a function object, accumulator.ID is stored but the start is never emitted, and the finish handlers then skip the call because StartEmitted stays false. This drops a real tool call for providers that split tool-call fields across chunks.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — that's a real regression I introduced in 4a90804. Fixed in a912284e:

Lifted the emit gate out of the if function.Exists() branch so it re-evaluates every chunk once both Name and ID are observed, regardless of whether the current delta carries a function object. Both guards remain (Name != "" and ID != ""), so the deferred-id behavior is preserved and we still never announce a SanitizeClaudeToolID fallback.

Added TestStreamingTool_IDInDeltaWithoutFunction covering exactly this split-chunk pattern: function.name in chunk 1, id alone in chunk 2 (no function field), arguments in chunk 3, finish_reason in chunk 4 — asserts one content_block_start with the real id and one content_block_stop. Test fails on 4a90804 and passes on a912284e.

…nction-less delta

Codex review on 4a90804 caught a regression introduced by the new
id-non-empty guard: the emit check was nested inside `if function.Exists()`,
so an upstream that splits tool fields across chunks — function.name in
one delta, then id alone in a later delta whose tool_call entry has no
`function` object — would store the id but never re-evaluate the emit
gate, dropping the tool call entirely on finish.

Lift the emit check out of the function block so it runs every chunk
once both name and id are observed, regardless of whether the current
chunk carries a `function` object. Both guards remain (Name != "" and
ID != ""), so the new path still defers emission until the real id is
available and never announces a SanitizeClaudeToolID fallback.

Add TestStreamingTool_IDInDeltaWithoutFunction covering the exact
split-chunk pattern (name in chunk 1 with function, id in chunk 2
without function, arguments in chunk 3, finish in chunk 4).
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a912284ec1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +329 to +333
// Skip tool calls whose start was never emitted (e.g. empty
// name throughout the stream). Emitting delta/stop for a
// block that has no start violates Anthropic event ordering.
if !accumulator.StartEmitted {
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Derive stop_reason from emitted tool blocks

When suppressed tool calls are skipped here, the stream can end up with zero emitted tool_use blocks while message_delta.stop_reason is still forced to tool_use via SawToolCall, which is set before validating whether a start was actually emitted. This creates an internally inconsistent Anthropic stream that strict clients may reject (they see tool_use stop reason but no executable tool block). Fresh evidence: the new id validation path (id must be a non-empty string) can now suppress all starts even when function.name is present, so this mismatch is no longer limited to empty-name inputs.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for flagging — I think this one is already covered by 0376ace though, and just pushed f83f1df which adds a regression test for exactly the scenario you describe.

SawToolCall, which is set before validating whether a start was actually emitted

This was true before 0376ace but not after. In 0376ace the assignment moved into the emit-success block (line 308 of the post-fix file, immediately after accumulator.StartEmitted = true), gated on the same Name != "" && ID != "" checks that gate emission. So SawToolCall now means "we actually announced a tool_use start" rather than "raw tool_calls field was seen."

the new id validation path can now suppress all starts even when function.name is present, so this mismatch is no longer limited to empty-name inputs

Right, but that case is what f83f1df exercises end-to-end — TestStreamingTool_StopReasonWhenIDNeverArrives:

  • chunk 1: function.name = "do_it", no id → name accumulated, emit deferred (id missing)
  • chunk 2: function.arguments = "{}" → args accumulated, still no id
  • chunk 3: finish_reason = "tool_calls" (upstream claims tool calls)

With 0376ace in place: SawToolCall stays false (no emit happened), so the finish handler hits the new fallback branch (case reason == "tool_calls":param.FinishReason = "stop"), and mapOpenAIFinishReasonToAnthropic resolves to end_turn. Final stream: zero tool_use blocks, stop_reason: end_turn — internally consistent. The test asserts both halves and passes locally on f83f1df (would fail on 4a90804, which is the version that introduced the id guard without the SawToolCall move).

Happy to be wrong here — if you can point at a specific input where the mismatch still reproduces on 0376ace / f83f1df, I'll dig in. Otherwise I'd lean toward marking this resolved.

…uppressed

Self-review caught an unresolved P1 from chatgpt-codex-connector's Apr 20
review on bec7c56 ("Downgrade stop reason when all tool starts are
suppressed") that slipped past the prior follow-up commit.

Bug: param.SawToolCall was set true on every raw tool_calls delta, even
when content_block_start was later suppressed by the name/id guards.
With finish_reason="tool_calls" from upstream, the stream ended with
stop_reason=tool_use and zero tool_use content blocks — strict
Anthropic clients reject this as a malformed turn.

Two-part fix:

1. Move param.SawToolCall = true into the emit block, alongside
   accumulator.StartEmitted = true. The flag now means "at least one
   tool_use start was actually announced" rather than "raw tool_calls
   field was seen". Updated the field doc comment to match.

2. In the finish_reason handler, when SawToolCall is false but the
   upstream finish_reason is "tool_calls" (suppressed-only stream),
   downgrade param.FinishReason to "stop" so the resulting Anthropic
   stop_reason is "end_turn" instead of "tool_use".

Tests:

- Strengthen TestStreamingTool_EmptyNameThroughout to assert
  stop_reason != "tool_use" (it would have caught this regression).
- Add TestStreamingTool_StopReasonWithEmittedTool to lock in the
  positive path (stop_reason stays "tool_use" when a real tool was
  emitted).
- Add TestStreamingTool_StopReasonMixedSuppressedAndValid to confirm
  one suppressed + one emitted tool still yields stop_reason="tool_use".
@slicenferqin
Copy link
Copy Markdown
Author

Self-review pass on the branch caught one more issue — pushed 0376ace.

Background: when I responded to your Apr 21 reply I said the follow-up commit 4bab86c had addressed three review points. Re-reading the bot threads today I noticed chatgpt-codex-connector actually filed two P1s on bec7c56 against line 255, not one:

  1. "Skip tool final events if start was never emitted" — addressed in 4bab86c. ✅
  2. "Downgrade stop reason when all tool starts are suppressed" — silently dropped. ❌

The second one is real: param.SawToolCall = true was set on every raw tool_calls delta, even when the start was later suppressed. With upstream finish_reason: "tool_calls", the resulting stream had stop_reason: tool_use and zero tool_use content blocks — exactly the "malformed turn" case that Anthropic-strict clients reject.

Fix in 0376ace:

  • Moved param.SawToolCall = true into the emit block, paired with accumulator.StartEmitted = true. The flag now means "we actually announced a tool_use start," not "raw tool_calls field was present."
  • In the finish_reason handler, when SawToolCall == false but upstream sent tool_calls, downgrade param.FinishReason to "stop" so the final mapped stop_reason becomes end_turn instead of tool_use.
  • Updated the SawToolCall field doc comment to reflect the new semantics.

Test deltas:

  • Strengthened TestStreamingTool_EmptyNameThroughout to also assert stop_reason != "tool_use". (Worth noting: this assertion would have caught the regression on 4a90804 — I should have included it from the start.)
  • Added TestStreamingTool_StopReasonWithEmittedTool (positive path: stop_reason stays "tool_use" when a real tool emits).
  • Added TestStreamingTool_StopReasonMixedSuppressedAndValid (one suppressed + one emitted still yields "tool_use").

Apologies for the back-and-forth — this should have been bundled in 4bab86c four days ago. Let me know if you'd like me to squash the four follow-up commits before merge, or leave the history granular.

… name does

Regression coverage for the codex review scenario on 0376ace: a stream
where function.name arrives and is valid, but no tool-call id is ever
sent. The new id-non-empty emit guard suppresses content_block_start,
and once SawToolCall is gated on actual emission (also from 0376ace),
the upstream "finish_reason: tool_calls" must downgrade to "stop" so
the stream isn't internally inconsistent (stop_reason=tool_use with
zero tool_use blocks).
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f83f1dfc85

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

// present. Both guards remain: name+id non-empty so we never
// announce a fallback id from SanitizeClaudeToolID's
// empty-input path.
if !accumulator.StartEmitted && accumulator.Name != "" && accumulator.ID != "" {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Prevent tool starts after stream finalization

This emit gate can still fire on chunks that arrive after a finish_reason chunk has already finalized the stream state. In that case, ContentBlocksStopped is already true (set in the finish handler), so a late id update (e.g. name seen earlier, id arriving later in a function-less delta) will emit a new content_block_start that never gets a matching delta/stop, and may even be emitted after message_stop when usage was sent with the finish chunk. That produces invalid Anthropic event ordering for strict clients.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — this one is a real bug, and the trigger condition you described is exactly right. Fixed in 4d6b2cd.

The "re-evaluate emit gate on every chunk" change in a912284 (added to make the deferred-id case work for split-chunk providers) didn't account for chunks arriving after the finish-handler had already finalized the stream:

  • chunk 1: function.name only, no id → emit deferred
  • chunk 2: finish_reason: tool_calls + usage → cleanup loop runs, ContentBlocksStopped = true, message_delta + message_stop emitted (downgraded to end_turn because SawToolCall is still false)
  • chunk 3: id arrives at last → previous gate (!StartEmitted && Name != "" && ID != "") was true, so a content_block_start was emitted after message_stop. Strict clients reject that ordering.

Fix is the obvious one — added !param.ContentBlocksStopped to the emit gate. Late tool fields arriving post-finalization are now silently dropped; the stream stays consistent with the already-downgraded stop_reason: end_turn.

New test TestStreamingTool_LateIDAfterFinalization reproduces exactly that 3-chunk sequence and asserts both that no tool_use start emits AND that no content_block_* event lands after message_stop. Fails on f83f1df, passes on 4d6b2cd.

Thanks for sticking with this thread — that's a genuine ordering violation I would have shipped.

Codex review on f83f1df caught a real ordering bug introduced by the
"re-evaluate emit gate every chunk" fix: once a finish_reason chunk has
already run the cleanup loop and set ContentBlocksStopped, a later
chunk that finally completes the (name, id) pair would still emit a
content_block_start. With usage in the same finalize chunk, that emit
lands AFTER message_stop — invalid Anthropic event ordering, rejected
by strict clients.

Add !param.ContentBlocksStopped to the emit gate. Late tool fields
arriving after finalization are now silently dropped; the stream
remains internally consistent with stop_reason already downgraded to
end_turn earlier.

Add TestStreamingTool_LateIDAfterFinalization which fires three chunks
in sequence (name only, then finish_reason+usage, then id alone) and
asserts both that no tool_use start is emitted AND that no
content_block_* event lands after message_stop.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4d6b2cd5c2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

// chunk has already finalized the stream) from emitting a
// content_block_start out of order — possibly even after
// message_stop, which strict Anthropic clients reject.
if !accumulator.StartEmitted && accumulator.Name != "" && accumulator.ID != "" && !param.ContentBlocksStopped {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve tool_use emission when upstream omits call id

The new emit gate now requires accumulator.ID != "" before sending content_block_start, which means any streaming provider that sends a valid tool name/arguments but never provides tool_calls[].id will have that tool call silently dropped. In this path StartEmitted stays false, cleanup skips delta/stop for that tool, and the stream is downgraded to end_turn, so clients never execute the tool despite receiving a real tool invocation payload. This is a regression from prior behavior where SanitizeClaudeToolID could generate a valid fallback ID and keep the tool call usable.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fair regression call. Fixed in b99ae8e.

I had read @luispater's "require non-empty id before emitting content_block_start" too narrowly and applied it as a hard drop. Your point is right: the prior behavior used SanitizeClaudeToolID's empty-input fallback to keep name-but-no-id streams usable, and removing it without a backstop is a regression for any provider that omits tool_calls[].id entirely.

Compromise that preserves both intents:

  • Out-of-order (id arrives later) — the inline emit still defers until name && id are both present, so we never announce a fallback id while a real id is still on the way (maintainer's original concern, unchanged).
  • Never arrives — the cleanup loops in both the finish_reason and [DONE] handlers now belated-emit content_block_start for any accumulator with a valid name but StartEmitted=false. SanitizeClaudeToolID synthesizes the fallback id from the empty input, the matching delta+stop follow in the same loop, and SawToolCall flips so the final stop_reason maps back to tool_use.

Refactored the inline emit body into emitToolUseStart() so the inline path and both cleanup paths share the same announcement logic.

Test changes:

  • TestStreamingTool_StopReasonWhenIDNeverArrives flipped to assert the new positive behavior: one tool_use start with a toolu_<nanos>_<n> id, stop_reason=tool_use.
  • TestStreamingTool_LateIDAfterFinalization adjusted: chunk 2's cleanup belated-emits with the synthetic id (stream is correctly addressable), and chunk 3's late id is silently absorbed (the inline StartEmitted guard already blocks re-emit). The ordering assertion (no content_block_* after message_stop) still holds.
  • Empty-name / null-name cases still drop entirely — Name == "" guard at cleanup is the floor.

Mind taking another pass?

Codex review on 4d6b2cd flagged a regression from the new id-non-empty
emit gate: a stream that supplies a valid function.name but never sends
tool_calls[].id had its tool call silently dropped, even though the
prior behavior used SanitizeClaudeToolID's empty-input fallback to
synthesize a usable id.

The fix preserves both intents:

- Out-of-order case (maintainer's original concern): the inline emit
  still defers until name+id are both present, so we never announce a
  fallback id while a real id is still on the way.

- Never-arrives case (codex's catch): the cleanup loops in the
  finish_reason and [DONE] handlers now belated-emit content_block_start
  for any accumulator with a valid name but StartEmitted=false.
  SanitizeClaudeToolID synthesizes a fallback id from the empty input,
  the matching delta+stop follow in the same loop, and SawToolCall flips
  so the final stop_reason maps to "tool_use" instead of being
  downgraded.

Refactor the inline emit body into emitToolUseStart() so the inline
path and both cleanup paths share the same announcement logic (block
index allocation, thinking/text stop ordering, SawToolCall toggle).

Tests:

- TestStreamingTool_StopReasonWhenIDNeverArrives now asserts the new
  positive behavior (1 tool_use start with toolu_<nanos>_<n> id,
  stop_reason=tool_use) instead of the old "drop entirely" behavior.

- TestStreamingTool_LateIDAfterFinalization adjusted to expect the
  belated emit in chunk 2 and confirm chunk 3's late id is absorbed
  silently (StartEmitted gates the inline path).

- All other tests unchanged and passing — the empty-name and null-name
  paths still drop entirely (Name == "" guard at cleanup), and
  out-of-order-but-arrives paths still emit exactly once with the real
  id.
@slicenferqin
Copy link
Copy Markdown
Author

@luispater — status update on this thread.

The review settled on b99ae8e after working through 4 follow-up P1s from chatgpt-codex-connector and your 3 suggestions from Apr 25. All 11 streaming-tool tests pass locally; CI's pr-test-build should be green (the translator-path-guard failure is the same policy override as the original commit).

Scope did grow past the initial 1-file fix while addressing the bot review thread, so happy to squash to a single commit before merge if that reads better — or keep it granular for traceability, your call.

No urgency on my end, just leaving a marker for whenever this comes back into your queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants