fix(chat-template): harden tool-call argument decoding#1327
Conversation
…arial args
OpenAI wire responses carry function.arguments as a JSON string, while
Qwen-family templates expect a mapping and iterate arguments|items (e.g.
qwen3.5_fixed.jinja). normalize_tool_arguments(..., "dict") decodes that wire
string back to a dict at the render boundary, but the decode was an unguarded
json.loads:
- "" / malformed JSON raised an uncaught json.JSONDecodeError (the only
try/except around apply_chat_template catches TemplateError, not
JSONDecodeError), crashing render / data-load;
- a value that decodes to a non-dict (list/number) or an explicit None leaked
through to the template, where arguments|items raises and the tool-format
fallback re-renders the same broken args -> ValueError;
- a native (already-Python) non-dict arguments (list/number) hit the same
arguments|items crash without ever passing through json.loads.
Route the inbound "dict" normalization through a new _dict_arguments() that
always yields a mapping: dict passthrough; str -> guarded json.loads (empty ->
{}), with a JSONDecodeError or non-dict decode preserved under a render-only
_raw_arguments sentinel so the malformed payload still renders instead of being
silently dropped; anything else -> {}. The outbound "json" (DeepSeek DSML)
direction and the OpenAI wire contract are untouched.
Add tests/fast/utils/chat_template_utils/test_tool_args_hardening.py: pure-CPU
tests that pin none of the adversarial inbound values raise, each yields a
mapping, malformed payloads survive under _raw_arguments, and the valid
dict-JSON / dict-passthrough default paths are unchanged. It lives under
tests/fast/, so the CPU CI suite (stage-a-cpu) auto-discovers and runs it on
every PR; no tokenizer or network is required.
There was a problem hiding this comment.
Code Review
This pull request hardens tool-call argument parsing for chat-template rendering by introducing a helper function _dict_arguments to safely coerce malformed, empty, or non-dict arguments into mappings, preventing template rendering crashes. A comprehensive test suite is also added to cover these adversarial cases. The reviewer suggests improving consistency by preserving native non-dict values under _raw_arguments (making them inspectable during rendering) rather than silently discarding them to empty dictionaries, and provides corresponding test updates.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if isinstance(args, dict): | ||
| return args | ||
| if isinstance(args, str): | ||
| try: | ||
| decoded = json.loads(args or "{}") | ||
| except json.JSONDecodeError: | ||
| return {"_raw_arguments": args} | ||
| return decoded if isinstance(decoded, dict) else {"_raw_arguments": args} | ||
| # Non-str, non-dict (None, or a native list/number) -> empty mapping; the leading | ||
| # ``isinstance(args, dict)`` guard already returned every dict, so nothing to keep. | ||
| return {} |
There was a problem hiding this comment.
There is an inconsistency in how adversarial arguments are handled: stringified non-dict values (like "[1, 2, 3]" or "42") are preserved under _raw_arguments so they can be rendered and inspected, whereas native non-dict values (like [1, 2, 3] or 42) are silently discarded and replaced with {}.
To make the behavior consistent and ensure all malformed payloads remain inspectable during rendering, we should also preserve native non-dict values under _raw_arguments. Only None and empty strings "" should be coerced to {}.
| if isinstance(args, dict): | |
| return args | |
| if isinstance(args, str): | |
| try: | |
| decoded = json.loads(args or "{}") | |
| except json.JSONDecodeError: | |
| return {"_raw_arguments": args} | |
| return decoded if isinstance(decoded, dict) else {"_raw_arguments": args} | |
| # Non-str, non-dict (None, or a native list/number) -> empty mapping; the leading | |
| # ``isinstance(args, dict)`` guard already returned every dict, so nothing to keep. | |
| return {} | |
| if isinstance(args, dict): | |
| return args | |
| if args is None or args == "": | |
| return {} | |
| if isinstance(args, str): | |
| try: | |
| decoded = json.loads(args) | |
| except json.JSONDecodeError: | |
| return {"_raw_arguments": args} | |
| return decoded if isinstance(decoded, dict) else {"_raw_arguments": args} | |
| return {"_raw_arguments": args} |
There was a problem hiding this comment.
Accepted, applied in 693af03. You're right that the two adversarial branches were inconsistent: a stringified non-dict ('[1, 2, 3]' / '42') was preserved under _raw_arguments, but a native non-dict ([1, 2, 3] / 42) was silently discarded to {}. _dict_arguments now preserves native non-dict losslessly under _raw_arguments as well; only None/falsy still maps to {} (nothing to preserve). This also aligns miles with our slime reference tool_call_arguments (slime/agent/auto_sample_builder/messages.py:45-52), which already preserves a native non-mapping under _raw_arguments. Note this only touches the adversarial branch this PR exists to harden — the default path is unchanged: a dict still passes through untouched and a valid JSON-object string still decodes to the exact dict (verified against all cases, incl. {"city":"London"} -> {"city":"London"}).
| def test_native_non_dict_becomes_empty_mapping(): | ||
| """A native (already-Python) non-dict ``arguments`` — not a wire string — must also | ||
| be coerced to a mapping, otherwise it leaks to ``arguments|items`` and crashes.""" | ||
| for arguments in ([1, 2, 3], 42): | ||
| normalized = normalize_tool_arguments(_messages_with_args(arguments), "dict") | ||
| assert _tool_call_arguments(normalized) == {} |
There was a problem hiding this comment.
Update this test to align with the improved behavior where native non-dict arguments are preserved under _raw_arguments instead of being silently discarded.
| def test_native_non_dict_becomes_empty_mapping(): | |
| """A native (already-Python) non-dict ``arguments`` — not a wire string — must also | |
| be coerced to a mapping, otherwise it leaks to ``arguments|items`` and crashes.""" | |
| for arguments in ([1, 2, 3], 42): | |
| normalized = normalize_tool_arguments(_messages_with_args(arguments), "dict") | |
| assert _tool_call_arguments(normalized) == {} | |
| def test_native_non_dict_preserved_under_raw_arguments(): | |
| """A native (already-Python) non-dict arguments - not a wire string - must also | |
| be coerced to a mapping, and preserved under _raw_arguments so it remains inspectable.""" | |
| for arguments in ([1, 2, 3], 42): | |
| normalized = normalize_tool_arguments(_messages_with_args(arguments), "dict") | |
| assert _tool_call_arguments(normalized) == {"_raw_arguments": arguments} |
There was a problem hiding this comment.
Accepted, applied in 693af03. The native-non-dict test now asserts [1, 2, 3] -> {"_raw_arguments": [1, 2, 3]} and 42 -> {"_raw_arguments": 42} (renamed to test_native_non_dict_preserved_under_raw_arguments), matching the code change. The existing render smoke test (test_render_does_not_raise) still covers native-list/native-number and continues to pass since {"_raw_arguments": ...} is a Mapping that arguments|items can iterate. The empty/None -> {} and valid-dict-JSON cases remain pinned to guard the unchanged paths.
Native non-dict tool_call arguments (e.g. [1,2,3], 42 that never went over
the wire as a string) were coerced to {} (discarded), while stringified
non-dict args were preserved under _raw_arguments. Preserve native non-dict
under _raw_arguments too for branch consistency and to match the slime
reference (slime/agent/auto_sample_builder/messages.py:tool_call_arguments).
Only None/falsy still maps to {}. Default path (dict passthrough, valid
JSON-object string -> dict) is unchanged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remove 3-line inline comment before the native non-dict return in _dict_arguments: the docstring already covers this case with the same rationale (preserve under _raw_arguments, mirror stringified branch, |items would raise). No logic changes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fold the three string-decodes-to-non-dict cases ({not json}, [1,2,3], 42)
into one loop in test_malformed_preserved_under_raw_arguments and add the
"42" case. Previously "42" was only checked for is-a-Mapping by the
parametrized test, never pinned to its exact preserved payload; under the
pre-fix bug it leaks the bare int 42 to the template, so the exact assert is
load-bearing. Closes the only adversarial input lacking an exact-value pin.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…guments")
The native non-dict branch of `_dict_arguments` gated the `_raw_arguments`
sentinel on truthiness (`if not args: return {}`), so falsy native values
(`0`, `False`, `[]`) were silently rewritten to an empty mapping instead of
being preserved. That contradicts the branch's own contract: lossless
preservation of non-dict argument payloads. Only `None` means "no arguments";
any other non-dict value is a real payload and now funnels through
`{"_raw_arguments": ...}`.
The string branch is unchanged: an empty wire string still decodes to `{}`,
since the OpenAI wire encodes "no arguments" as `""`/`"{}"`.
Extends `test_native_non_dict_preserved_under_raw_arguments` with `0` and
`[]`; both fail on the previous truthiness-gated code.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Problem
OpenAI wire responses carry
function.argumentsas a JSON string, while Qwen-family templates expect a mapping and iterate it astool_call.arguments|items(miles/utils/chat_template_utils/templates/qwen3.5_fixed.jinja:117). At the render boundary,normalize_tool_arguments(messages, "dict")(miles/utils/chat_template_utils/template.py) decodes that echoed wire string back to a dict — but the decode was an unguardedjson.loads(args)gated only onisinstance(args, str). The onlytry/exceptwrappingapply_chat_templatecatchesTemplateError, notJSONDecodeError(template.py:124-132), so three inbound shapes broke render / data-load:""or malformed JSON →json.loadsraises an uncaughtjson.JSONDecodeError, crashing render."[1, 2, 3]","42") or an explicitNone→ leaks a non-mapping into the template, wherearguments|itemsraisesTemplateError; the tool-format fallback then re-renders the same broken args and surfaces aValueError.arguments(a list / number) → never enters theisinstance(args, str)branch, so normalization is skipped and the raw non-dict hitsarguments|itemsdirectly.Before vs After
Same input, fed through
"dict"normalization and then rendered withqwen3.5_fixed.jinja(the original crash site). Reproduces the parametrized cases in the test file.Before — the very first case crashes:
and the non-dict cases (e.g.
"[1, 2, 3]", native42) crash later in render:After — every case renders to a
str. The normalizedargumentsper input:arguments'{"city": "London"}'(valid){"city": "London"}(unchanged){"city": "Paris"}(already dict){"city": "Paris"}(passthrough)""{}None{}"{not json}"{"_raw_arguments": "{not json}"}"[1, 2, 3]"{"_raw_arguments": "[1, 2, 3]"}"42"{"_raw_arguments": "42"}[1, 2, 3](native){"_raw_arguments": [1, 2, 3]}42(native){"_raw_arguments": 42}0(native, falsy){"_raw_arguments": 0}[](native, falsy){"_raw_arguments": []}A malformed/non-dict payload is preserved verbatim under
_raw_argumentsrather than silently dropped to an empty call, so it still renders and stays inspectable.Fix
Route the inbound
"dict"normalization through a new_dict_arguments()that always returns a mapping:dict→ passthrough.str→ guardedjson.loads(args or "{}"); aJSONDecodeError, or a value that decodes to a non-dict, is preserved as{"_raw_arguments": args}. An empty wire string →{}(the OpenAI wire encodes "no arguments" as""/"{}").None→{}(no arguments).0/[]included) → preserved as{"_raw_arguments": args}, mirroring the stringified-non-dict branch. OnlyNonemeans "no arguments": a falsy payload such as0or[]is still a payload the model emitted, so it is preserved rather than silently rewritten to an empty mapping._raw_argumentsis a render-only sentinel with no programmatic consumer: it givesarguments|itemsa key to iterate so a malformed call surfaces its raw payload in the rendered text instead of being rewritten to an empty one. The call-site guard also changes fromisinstance(args, str)to"arguments" in func, so native non-dict args now reach the coercion instead of being skipped.Why this is the right fix
'{"city": "London"}'→{"city": "London"}) and already-dict passthrough are unchanged and pinned by tests; only the previously-crashing branches gain coercion.is Nonemeans the normalization never drops a real (even falsy) payload."json"direction (DeepSeek DSML encoders) and the OpenAI wire contract (argumentsstays a JSON string) are untouched — a test asserts a non-dict wire string is left as-is on the"json"path.normalize_tool_argumentsstill deep-copies, so the input message is never mutated (pinned bytest_does_not_mutate_input).tests/fast/utils/chat_template_utils/test_tool_args_hardening.py(pure-Python, no tokenizer / network). It pins that none of the adversarial inbound values raise, each yields aMapping, the fullqwen3.5_fixed.jinjarender does not raise, malformed/native-non-dict payloads survive under_raw_arguments(falsy natives0/[]included), and the valid/passthrough/outbound paths are unchanged. Living undertests/fast/, the CPU suite (stage-a-cpu) auto-discovers it on every PR with no extra registration.