fix(chat-template): harden tool-call argument decoding by EazyReal · Pull Request #1327 · radixark/miles

EazyReal · 2026-06-12T04:22:01Z

Problem

OpenAI wire responses carry function.arguments as a JSON string, while Qwen-family templates expect a mapping and iterate it as tool_call.arguments|items (miles/utils/chat_template_utils/templates/qwen3.5_fixed.jinja:117). At the render boundary, normalize_tool_arguments(messages, "dict") (miles/utils/chat_template_utils/template.py) decodes that echoed wire string back to a dict — but the decode was an unguarded json.loads(args) gated only on isinstance(args, str). The only try/except wrapping apply_chat_template catches TemplateError, not JSONDecodeError (template.py:124-132), so three inbound shapes broke render / data-load:

"" or malformed JSON → json.loads raises an uncaught json.JSONDecodeError, crashing render.
a string that decodes to a non-dict ("[1, 2, 3]", "42") or an explicit None → leaks a non-mapping into the template, where arguments|items raises TemplateError; the tool-format fallback then re-renders the same broken args and surfaces a ValueError.
a native (already-Python) non-dict arguments (a list / number) → never enters the isinstance(args, str) branch, so normalization is skipped and the raw non-dict hits arguments|items directly.

Before vs After

Same input, fed through "dict" normalization and then rendered with qwen3.5_fixed.jinja (the original crash site). Reproduces the parametrized cases in the test file.

from miles.utils.chat_template_utils import (
    TEMPLATE_DIR, apply_chat_template_from_str, normalize_tool_arguments,
)

def messages(arguments):
    return [
        {"role": "user", "content": "weather?"},
        {"role": "assistant", "content": None, "tool_calls": [
            {"id": "call_1", "type": "function",
             "function": {"name": "get_weather", "arguments": arguments}}]},
        {"role": "tool", "content": "sunny", "tool_call_id": "call_1", "name": "get_weather"},
    ]

qwen = (TEMPLATE_DIR / "qwen3.5_fixed.jinja").read_text()
for arg in ["", "{not json}", "[1, 2, 3]", "42", None, [1, 2, 3], 42]:
    apply_chat_template_from_str(qwen, messages(arg), tools=TOOLS)

Before — the very first case crashes:

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

and the non-dict cases (e.g. "[1, 2, 3]", native 42) crash later in render:

ValueError: Chat template rendering failed (tool format fallback): ...
# arguments|items on a list/number raised TemplateError

After — every case renders to a str. The normalized arguments per input:

inbound `arguments`	normalized to
`'{"city": "London"}'` (valid)	`{"city": "London"}` (unchanged)
`{"city": "Paris"}` (already dict)	`{"city": "Paris"}` (passthrough)
`""`	`{}`
`None`	`{}`
`"{not json}"`	`{"_raw_arguments": "{not json}"}`
`"[1, 2, 3]"`	`{"_raw_arguments": "[1, 2, 3]"}`
`"42"`	`{"_raw_arguments": "42"}`
`[1, 2, 3]` (native)	`{"_raw_arguments": [1, 2, 3]}`
`42` (native)	`{"_raw_arguments": 42}`
`0` (native, falsy)	`{"_raw_arguments": 0}`
`[]` (native, falsy)	`{"_raw_arguments": []}`

A malformed/non-dict payload is preserved verbatim under _raw_arguments rather than silently dropped to an empty call, so it still renders and stays inspectable.

Fix

Route the inbound "dict" normalization through a new _dict_arguments() that always returns a mapping:

dict → passthrough.
str → guarded json.loads(args or "{}"); a JSONDecodeError, or a value that decodes to a non-dict, is preserved as {"_raw_arguments": args}. An empty wire string → {} (the OpenAI wire encodes "no arguments" as ""/"{}").
None → {} (no arguments).
any other native non-dict (list / number, falsy values like 0 / [] included) → preserved as {"_raw_arguments": args}, mirroring the stringified-non-dict branch. Only None means "no arguments": a falsy payload such as 0 or [] is still a payload the model emitted, so it is preserved rather than silently rewritten to an empty mapping.

_raw_arguments is a render-only sentinel with no programmatic consumer: it gives arguments|items a key to iterate so a malformed call surfaces its raw payload in the rendered text instead of being rewritten to an empty one. The call-site guard also changes from isinstance(args, str) to "arguments" in func, so native non-dict args now reach the coercion instead of being skipped.

Why this is the right fix

Default path is bit-identical. Valid dict-JSON ('{"city": "London"}' → {"city": "London"}) and already-dict passthrough are unchanged and pinned by tests; only the previously-crashing branches gain coercion.
Lossless. Truthiness is the wrong gate for "no arguments" — gating the sentinel on is None means the normalization never drops a real (even falsy) payload.
Scoped to inbound render normalization. The outbound "json" direction (DeepSeek DSML encoders) and the OpenAI wire contract (arguments stays a JSON string) are untouched — a test asserts a non-dict wire string is left as-is on the "json" path.
No new abstraction, no mutation. One private helper; normalize_tool_arguments still deep-copies, so the input message is never mutated (pinned by test_does_not_mutate_input).
CI-verifiable, CPU-only. Adds tests/fast/utils/chat_template_utils/test_tool_args_hardening.py (pure-Python, no tokenizer / network). It pins that none of the adversarial inbound values raise, each yields a Mapping, the full qwen3.5_fixed.jinja render does not raise, malformed/native-non-dict payloads survive under _raw_arguments (falsy natives 0 / [] included), and the valid/passthrough/outbound paths are unchanged. Living under tests/fast/, the CPU suite (stage-a-cpu) auto-discovers it on every PR with no extra registration.

…arial args OpenAI wire responses carry function.arguments as a JSON string, while Qwen-family templates expect a mapping and iterate arguments|items (e.g. qwen3.5_fixed.jinja). normalize_tool_arguments(..., "dict") decodes that wire string back to a dict at the render boundary, but the decode was an unguarded json.loads: - "" / malformed JSON raised an uncaught json.JSONDecodeError (the only try/except around apply_chat_template catches TemplateError, not JSONDecodeError), crashing render / data-load; - a value that decodes to a non-dict (list/number) or an explicit None leaked through to the template, where arguments|items raises and the tool-format fallback re-renders the same broken args -> ValueError; - a native (already-Python) non-dict arguments (list/number) hit the same arguments|items crash without ever passing through json.loads. Route the inbound "dict" normalization through a new _dict_arguments() that always yields a mapping: dict passthrough; str -> guarded json.loads (empty -> {}), with a JSONDecodeError or non-dict decode preserved under a render-only _raw_arguments sentinel so the malformed payload still renders instead of being silently dropped; anything else -> {}. The outbound "json" (DeepSeek DSML) direction and the OpenAI wire contract are untouched. Add tests/fast/utils/chat_template_utils/test_tool_args_hardening.py: pure-CPU tests that pin none of the adversarial inbound values raise, each yields a mapping, malformed payloads survive under _raw_arguments, and the valid dict-JSON / dict-passthrough default paths are unchanged. It lives under tests/fast/, so the CPU CI suite (stage-a-cpu) auto-discovers and runs it on every PR; no tokenizer or network is required.

gemini-code-assist

Code Review

This pull request hardens tool-call argument parsing for chat-template rendering by introducing a helper function _dict_arguments to safely coerce malformed, empty, or non-dict arguments into mappings, preventing template rendering crashes. A comprehensive test suite is also added to cover these adversarial cases. The reviewer suggests improving consistency by preserving native non-dict values under _raw_arguments (making them inspectable during rendering) rather than silently discarding them to empty dictionaries, and provides corresponding test updates.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-12T04:23:32Z

+    if isinstance(args, dict):
+        return args
+    if isinstance(args, str):
+        try:
+            decoded = json.loads(args or "{}")
+        except json.JSONDecodeError:
+            return {"_raw_arguments": args}
+        return decoded if isinstance(decoded, dict) else {"_raw_arguments": args}
+    # Non-str, non-dict (None, or a native list/number) -> empty mapping; the leading
+    # ``isinstance(args, dict)`` guard already returned every dict, so nothing to keep.
+    return {}


There is an inconsistency in how adversarial arguments are handled: stringified non-dict values (like "[1, 2, 3]" or "42") are preserved under _raw_arguments so they can be rendered and inspected, whereas native non-dict values (like [1, 2, 3] or 42) are silently discarded and replaced with {}.

To make the behavior consistent and ensure all malformed payloads remain inspectable during rendering, we should also preserve native non-dict values under _raw_arguments. Only None and empty strings "" should be coerced to {}.

Suggested change

if isinstance(args, dict):

return args

if isinstance(args, str):

try:

decoded = json.loads(args or "{}")

except json.JSONDecodeError:

return {"_raw_arguments": args}

return decoded if isinstance(decoded, dict) else {"_raw_arguments": args}

# Non-str, non-dict (None, or a native list/number) -> empty mapping; the leading

# ``isinstance(args, dict)`` guard already returned every dict, so nothing to keep.

return {}

if isinstance(args, dict):

return args

if args is None or args == "":

return {}

if isinstance(args, str):

try:

decoded = json.loads(args)

except json.JSONDecodeError:

return {"_raw_arguments": args}

return decoded if isinstance(decoded, dict) else {"_raw_arguments": args}

return {"_raw_arguments": args}

Accepted, applied in 693af03. You're right that the two adversarial branches were inconsistent: a stringified non-dict ('[1, 2, 3]' / '42') was preserved under _raw_arguments, but a native non-dict ([1, 2, 3] / 42) was silently discarded to {}. _dict_arguments now preserves native non-dict losslessly under _raw_arguments as well; only None/falsy still maps to {} (nothing to preserve). This also aligns miles with our slime reference tool_call_arguments (slime/agent/auto_sample_builder/messages.py:45-52), which already preserves a native non-mapping under _raw_arguments. Note this only touches the adversarial branch this PR exists to harden — the default path is unchanged: a dict still passes through untouched and a valid JSON-object string still decodes to the exact dict (verified against all cases, incl. {"city":"London"} -> {"city":"London"}).

gemini-code-assist · 2026-06-12T04:23:32Z

+def test_native_non_dict_becomes_empty_mapping():
+    """A native (already-Python) non-dict ``arguments`` — not a wire string — must also
+    be coerced to a mapping, otherwise it leaks to ``arguments|items`` and crashes."""
+    for arguments in ([1, 2, 3], 42):
+        normalized = normalize_tool_arguments(_messages_with_args(arguments), "dict")
+        assert _tool_call_arguments(normalized) == {}


Update this test to align with the improved behavior where native non-dict arguments are preserved under _raw_arguments instead of being silently discarded.

Suggested change

def test_native_non_dict_becomes_empty_mapping():

"""A native (already-Python) non-dict ``arguments`` — not a wire string — must also

be coerced to a mapping, otherwise it leaks to ``arguments|items`` and crashes."""

for arguments in ([1, 2, 3], 42):

normalized = normalize_tool_arguments(_messages_with_args(arguments), "dict")

assert _tool_call_arguments(normalized) == {}

def test_native_non_dict_preserved_under_raw_arguments():

"""A native (already-Python) non-dict arguments - not a wire string - must also

be coerced to a mapping, and preserved under _raw_arguments so it remains inspectable."""

for arguments in ([1, 2, 3], 42):

normalized = normalize_tool_arguments(_messages_with_args(arguments), "dict")

assert _tool_call_arguments(normalized) == {"_raw_arguments": arguments}

Accepted, applied in 693af03. The native-non-dict test now asserts [1, 2, 3] -> {"_raw_arguments": [1, 2, 3]} and 42 -> {"_raw_arguments": 42} (renamed to test_native_non_dict_preserved_under_raw_arguments), matching the code change. The existing render smoke test (test_render_does_not_raise) still covers native-list/native-number and continues to pass since {"_raw_arguments": ...} is a Mapping that arguments|items can iterate. The empty/None -> {} and valid-dict-JSON cases remain pinned to guard the unchanged paths.

Native non-dict tool_call arguments (e.g. [1,2,3], 42 that never went over the wire as a string) were coerced to {} (discarded), while stringified non-dict args were preserved under _raw_arguments. Preserve native non-dict under _raw_arguments too for branch consistency and to match the slime reference (slime/agent/auto_sample_builder/messages.py:tool_call_arguments). Only None/falsy still maps to {}. Default path (dict passthrough, valid JSON-object string -> dict) is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Remove 3-line inline comment before the native non-dict return in _dict_arguments: the docstring already covers this case with the same rationale (preserve under _raw_arguments, mirror stringified branch, |items would raise). No logic changes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Fold the three string-decodes-to-non-dict cases ({not json}, [1,2,3], 42) into one loop in test_malformed_preserved_under_raw_arguments and add the "42" case. Previously "42" was only checked for is-a-Mapping by the parametrized test, never pinned to its exact preserved payload; under the pre-fix bug it leaks the bare int 42 to the template, so the exact assert is load-bearing. Closes the only adversarial input lacking an exact-value pin. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…guments") The native non-dict branch of `_dict_arguments` gated the `_raw_arguments` sentinel on truthiness (`if not args: return {}`), so falsy native values (`0`, `False`, `[]`) were silently rewritten to an empty mapping instead of being preserved. That contradicts the branch's own contract: lossless preservation of non-dict argument payloads. Only `None` means "no arguments"; any other non-dict value is a real payload and now funnels through `{"_raw_arguments": ...}`. The string branch is unchanged: an empty wire string still decodes to `{}`, since the OpenAI wire encodes "no arguments" as `""`/`"{}"`. Extends `test_native_non_dict_preserved_under_raw_arguments` with `0` and `[]`; both fail on the previous truthiness-gated code. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

EazyReal requested review from Shi-Dong, Zhichenzzz, fzyzcjy, guapisolo, jybsuper, maocheng23 and yueming-yuan as code owners June 12, 2026 04:22

gemini-code-assist Bot reviewed Jun 12, 2026

View reviewed changes

EazyReal and others added 4 commits June 12, 2026 05:47

EazyReal changed the title ~~fix(chat-template): harden tool-call argument decoding against adversarial args~~ fix(chat-template): harden tool-call argument decoding Jun 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(chat-template): harden tool-call argument decoding#1327

fix(chat-template): harden tool-call argument decoding#1327
EazyReal wants to merge 5 commits into
radixark:mainfrom
EazyReal:upstream-pr/toolcall-args-decode-hardening

EazyReal commented Jun 12, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Uh oh!

EazyReal Jun 12, 2026

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Uh oh!

EazyReal Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

EazyReal commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Before vs After

Fix

Why this is the right fix

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

EazyReal Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

EazyReal Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

EazyReal commented Jun 12, 2026 •

edited

Loading