Skip to content

fix(chat-template): harden tool-call argument decoding#1327

Open
EazyReal wants to merge 5 commits into
radixark:mainfrom
EazyReal:upstream-pr/toolcall-args-decode-hardening
Open

fix(chat-template): harden tool-call argument decoding#1327
EazyReal wants to merge 5 commits into
radixark:mainfrom
EazyReal:upstream-pr/toolcall-args-decode-hardening

Conversation

@EazyReal

@EazyReal EazyReal commented Jun 12, 2026

Copy link
Copy Markdown

Problem

OpenAI wire responses carry function.arguments as a JSON string, while Qwen-family templates expect a mapping and iterate it as tool_call.arguments|items (miles/utils/chat_template_utils/templates/qwen3.5_fixed.jinja:117). At the render boundary, normalize_tool_arguments(messages, "dict") (miles/utils/chat_template_utils/template.py) decodes that echoed wire string back to a dict — but the decode was an unguarded json.loads(args) gated only on isinstance(args, str). The only try/except wrapping apply_chat_template catches TemplateError, not JSONDecodeError (template.py:124-132), so three inbound shapes broke render / data-load:

  • "" or malformed JSON → json.loads raises an uncaught json.JSONDecodeError, crashing render.
  • a string that decodes to a non-dict ("[1, 2, 3]", "42") or an explicit None → leaks a non-mapping into the template, where arguments|items raises TemplateError; the tool-format fallback then re-renders the same broken args and surfaces a ValueError.
  • a native (already-Python) non-dict arguments (a list / number) → never enters the isinstance(args, str) branch, so normalization is skipped and the raw non-dict hits arguments|items directly.

Before vs After

Same input, fed through "dict" normalization and then rendered with qwen3.5_fixed.jinja (the original crash site). Reproduces the parametrized cases in the test file.

from miles.utils.chat_template_utils import (
    TEMPLATE_DIR, apply_chat_template_from_str, normalize_tool_arguments,
)

def messages(arguments):
    return [
        {"role": "user", "content": "weather?"},
        {"role": "assistant", "content": None, "tool_calls": [
            {"id": "call_1", "type": "function",
             "function": {"name": "get_weather", "arguments": arguments}}]},
        {"role": "tool", "content": "sunny", "tool_call_id": "call_1", "name": "get_weather"},
    ]

qwen = (TEMPLATE_DIR / "qwen3.5_fixed.jinja").read_text()
for arg in ["", "{not json}", "[1, 2, 3]", "42", None, [1, 2, 3], 42]:
    apply_chat_template_from_str(qwen, messages(arg), tools=TOOLS)

Before — the very first case crashes:

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

and the non-dict cases (e.g. "[1, 2, 3]", native 42) crash later in render:

ValueError: Chat template rendering failed (tool format fallback): ...
# arguments|items on a list/number raised TemplateError

After — every case renders to a str. The normalized arguments per input:

inbound arguments normalized to
'{"city": "London"}' (valid) {"city": "London"} (unchanged)
{"city": "Paris"} (already dict) {"city": "Paris"} (passthrough)
"" {}
None {}
"{not json}" {"_raw_arguments": "{not json}"}
"[1, 2, 3]" {"_raw_arguments": "[1, 2, 3]"}
"42" {"_raw_arguments": "42"}
[1, 2, 3] (native) {"_raw_arguments": [1, 2, 3]}
42 (native) {"_raw_arguments": 42}
0 (native, falsy) {"_raw_arguments": 0}
[] (native, falsy) {"_raw_arguments": []}

A malformed/non-dict payload is preserved verbatim under _raw_arguments rather than silently dropped to an empty call, so it still renders and stays inspectable.

Fix

Route the inbound "dict" normalization through a new _dict_arguments() that always returns a mapping:

  • dict → passthrough.
  • str → guarded json.loads(args or "{}"); a JSONDecodeError, or a value that decodes to a non-dict, is preserved as {"_raw_arguments": args}. An empty wire string → {} (the OpenAI wire encodes "no arguments" as ""/"{}").
  • None{} (no arguments).
  • any other native non-dict (list / number, falsy values like 0 / [] included) → preserved as {"_raw_arguments": args}, mirroring the stringified-non-dict branch. Only None means "no arguments": a falsy payload such as 0 or [] is still a payload the model emitted, so it is preserved rather than silently rewritten to an empty mapping.

_raw_arguments is a render-only sentinel with no programmatic consumer: it gives arguments|items a key to iterate so a malformed call surfaces its raw payload in the rendered text instead of being rewritten to an empty one. The call-site guard also changes from isinstance(args, str) to "arguments" in func, so native non-dict args now reach the coercion instead of being skipped.

Why this is the right fix

  • Default path is bit-identical. Valid dict-JSON ('{"city": "London"}'{"city": "London"}) and already-dict passthrough are unchanged and pinned by tests; only the previously-crashing branches gain coercion.
  • Lossless. Truthiness is the wrong gate for "no arguments" — gating the sentinel on is None means the normalization never drops a real (even falsy) payload.
  • Scoped to inbound render normalization. The outbound "json" direction (DeepSeek DSML encoders) and the OpenAI wire contract (arguments stays a JSON string) are untouched — a test asserts a non-dict wire string is left as-is on the "json" path.
  • No new abstraction, no mutation. One private helper; normalize_tool_arguments still deep-copies, so the input message is never mutated (pinned by test_does_not_mutate_input).
  • CI-verifiable, CPU-only. Adds tests/fast/utils/chat_template_utils/test_tool_args_hardening.py (pure-Python, no tokenizer / network). It pins that none of the adversarial inbound values raise, each yields a Mapping, the full qwen3.5_fixed.jinja render does not raise, malformed/native-non-dict payloads survive under _raw_arguments (falsy natives 0 / [] included), and the valid/passthrough/outbound paths are unchanged. Living under tests/fast/, the CPU suite (stage-a-cpu) auto-discovers it on every PR with no extra registration.

…arial args

OpenAI wire responses carry function.arguments as a JSON string, while
Qwen-family templates expect a mapping and iterate arguments|items (e.g.
qwen3.5_fixed.jinja). normalize_tool_arguments(..., "dict") decodes that wire
string back to a dict at the render boundary, but the decode was an unguarded
json.loads:

- "" / malformed JSON raised an uncaught json.JSONDecodeError (the only
  try/except around apply_chat_template catches TemplateError, not
  JSONDecodeError), crashing render / data-load;
- a value that decodes to a non-dict (list/number) or an explicit None leaked
  through to the template, where arguments|items raises and the tool-format
  fallback re-renders the same broken args -> ValueError;
- a native (already-Python) non-dict arguments (list/number) hit the same
  arguments|items crash without ever passing through json.loads.

Route the inbound "dict" normalization through a new _dict_arguments() that
always yields a mapping: dict passthrough; str -> guarded json.loads (empty ->
{}), with a JSONDecodeError or non-dict decode preserved under a render-only
_raw_arguments sentinel so the malformed payload still renders instead of being
silently dropped; anything else -> {}. The outbound "json" (DeepSeek DSML)
direction and the OpenAI wire contract are untouched.

Add tests/fast/utils/chat_template_utils/test_tool_args_hardening.py: pure-CPU
tests that pin none of the adversarial inbound values raise, each yields a
mapping, malformed payloads survive under _raw_arguments, and the valid
dict-JSON / dict-passthrough default paths are unchanged. It lives under
tests/fast/, so the CPU CI suite (stage-a-cpu) auto-discovers and runs it on
every PR; no tokenizer or network is required.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request hardens tool-call argument parsing for chat-template rendering by introducing a helper function _dict_arguments to safely coerce malformed, empty, or non-dict arguments into mappings, preventing template rendering crashes. A comprehensive test suite is also added to cover these adversarial cases. The reviewer suggests improving consistency by preserving native non-dict values under _raw_arguments (making them inspectable during rendering) rather than silently discarding them to empty dictionaries, and provides corresponding test updates.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +77 to +87
if isinstance(args, dict):
return args
if isinstance(args, str):
try:
decoded = json.loads(args or "{}")
except json.JSONDecodeError:
return {"_raw_arguments": args}
return decoded if isinstance(decoded, dict) else {"_raw_arguments": args}
# Non-str, non-dict (None, or a native list/number) -> empty mapping; the leading
# ``isinstance(args, dict)`` guard already returned every dict, so nothing to keep.
return {}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is an inconsistency in how adversarial arguments are handled: stringified non-dict values (like "[1, 2, 3]" or "42") are preserved under _raw_arguments so they can be rendered and inspected, whereas native non-dict values (like [1, 2, 3] or 42) are silently discarded and replaced with {}.

To make the behavior consistent and ensure all malformed payloads remain inspectable during rendering, we should also preserve native non-dict values under _raw_arguments. Only None and empty strings "" should be coerced to {}.

Suggested change
if isinstance(args, dict):
return args
if isinstance(args, str):
try:
decoded = json.loads(args or "{}")
except json.JSONDecodeError:
return {"_raw_arguments": args}
return decoded if isinstance(decoded, dict) else {"_raw_arguments": args}
# Non-str, non-dict (None, or a native list/number) -> empty mapping; the leading
# ``isinstance(args, dict)`` guard already returned every dict, so nothing to keep.
return {}
if isinstance(args, dict):
return args
if args is None or args == "":
return {}
if isinstance(args, str):
try:
decoded = json.loads(args)
except json.JSONDecodeError:
return {"_raw_arguments": args}
return decoded if isinstance(decoded, dict) else {"_raw_arguments": args}
return {"_raw_arguments": args}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepted, applied in 693af03. You're right that the two adversarial branches were inconsistent: a stringified non-dict ('[1, 2, 3]' / '42') was preserved under _raw_arguments, but a native non-dict ([1, 2, 3] / 42) was silently discarded to {}. _dict_arguments now preserves native non-dict losslessly under _raw_arguments as well; only None/falsy still maps to {} (nothing to preserve). This also aligns miles with our slime reference tool_call_arguments (slime/agent/auto_sample_builder/messages.py:45-52), which already preserves a native non-mapping under _raw_arguments. Note this only touches the adversarial branch this PR exists to harden — the default path is unchanged: a dict still passes through untouched and a valid JSON-object string still decodes to the exact dict (verified against all cases, incl. {"city":"London"} -> {"city":"London"}).

Comment on lines +123 to +128
def test_native_non_dict_becomes_empty_mapping():
"""A native (already-Python) non-dict ``arguments`` — not a wire string — must also
be coerced to a mapping, otherwise it leaks to ``arguments|items`` and crashes."""
for arguments in ([1, 2, 3], 42):
normalized = normalize_tool_arguments(_messages_with_args(arguments), "dict")
assert _tool_call_arguments(normalized) == {}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Update this test to align with the improved behavior where native non-dict arguments are preserved under _raw_arguments instead of being silently discarded.

Suggested change
def test_native_non_dict_becomes_empty_mapping():
"""A native (already-Python) non-dict ``arguments`` not a wire string must also
be coerced to a mapping, otherwise it leaks to ``arguments|items`` and crashes."""
for arguments in ([1, 2, 3], 42):
normalized = normalize_tool_arguments(_messages_with_args(arguments), "dict")
assert _tool_call_arguments(normalized) == {}
def test_native_non_dict_preserved_under_raw_arguments():
"""A native (already-Python) non-dict arguments - not a wire string - must also
be coerced to a mapping, and preserved under _raw_arguments so it remains inspectable."""
for arguments in ([1, 2, 3], 42):
normalized = normalize_tool_arguments(_messages_with_args(arguments), "dict")
assert _tool_call_arguments(normalized) == {"_raw_arguments": arguments}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepted, applied in 693af03. The native-non-dict test now asserts [1, 2, 3] -> {"_raw_arguments": [1, 2, 3]} and 42 -> {"_raw_arguments": 42} (renamed to test_native_non_dict_preserved_under_raw_arguments), matching the code change. The existing render smoke test (test_render_does_not_raise) still covers native-list/native-number and continues to pass since {"_raw_arguments": ...} is a Mapping that arguments|items can iterate. The empty/None -> {} and valid-dict-JSON cases remain pinned to guard the unchanged paths.

EazyReal and others added 4 commits June 12, 2026 05:47
Native non-dict tool_call arguments (e.g. [1,2,3], 42 that never went over
the wire as a string) were coerced to {} (discarded), while stringified
non-dict args were preserved under _raw_arguments. Preserve native non-dict
under _raw_arguments too for branch consistency and to match the slime
reference (slime/agent/auto_sample_builder/messages.py:tool_call_arguments).
Only None/falsy still maps to {}. Default path (dict passthrough, valid
JSON-object string -> dict) is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remove 3-line inline comment before the native non-dict return in
_dict_arguments: the docstring already covers this case with the same
rationale (preserve under _raw_arguments, mirror stringified branch,
|items would raise). No logic changes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fold the three string-decodes-to-non-dict cases ({not json}, [1,2,3], 42)
into one loop in test_malformed_preserved_under_raw_arguments and add the
"42" case. Previously "42" was only checked for is-a-Mapping by the
parametrized test, never pinned to its exact preserved payload; under the
pre-fix bug it leaks the bare int 42 to the template, so the exact assert is
load-bearing. Closes the only adversarial input lacking an exact-value pin.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…guments")

The native non-dict branch of `_dict_arguments` gated the `_raw_arguments`
sentinel on truthiness (`if not args: return {}`), so falsy native values
(`0`, `False`, `[]`) were silently rewritten to an empty mapping instead of
being preserved. That contradicts the branch's own contract: lossless
preservation of non-dict argument payloads. Only `None` means "no arguments";
any other non-dict value is a real payload and now funnels through
`{"_raw_arguments": ...}`.

The string branch is unchanged: an empty wire string still decodes to `{}`,
since the OpenAI wire encodes "no arguments" as `""`/`"{}"`.

Extends `test_native_non_dict_preserved_under_raw_arguments` with `0` and
`[]`; both fail on the previous truthiness-gated code.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@EazyReal EazyReal changed the title fix(chat-template): harden tool-call argument decoding against adversarial args fix(chat-template): harden tool-call argument decoding Jun 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant