Fix augmented assignment double-evaluation of RHS expressions#1282
Fix augmented assignment double-evaluation of RHS expressions#1282cluster2600 wants to merge 3 commits intoNVIDIA:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughPropagate a precomputed RHS ( Changes
Sequence Diagram(s)sequenceDiagram
participant AST as AST (AugAssign node)
participant Adj as Adj/Evaluator
participant DoAug as do_augmented_assign
participant Assign as assign_to_target / storage
AST->>Adj: request evaluate RHS (node.value) -> produce _wp_precomputed_rhs
Adj->>DoAug: call do_augmented_assign(lhs, _wp_precomputed_rhs, op)
DoAug->>Adj: evaluate/resolve LHS (preserve autograd/read state)
DoAug->>Adj: compute result = lhs OP _wp_precomputed_rhs (builtin or overload)
DoAug->>Assign: assign result to target (uses synthetic new_assign._wp_precomputed_rhs)
Assign->>Adj: complete store (write-back)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Greptile SummaryThis PR fixes a long-standing double-evaluation bug (#1233) in augmented-assignment codegen ( Key changes:
One minor nit: the SPDX copyright year in the new test file reads Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["emit_AugAssign(node)"] --> B{lhs type?}
B -- "ast.Name" --> C["eval(rhs) once\nadd_builtin_call / add_call\nupdate symbols\nreturn"]
B -- "ast.Subscript\nast.Attribute\nfallback" --> D["rhs = adj.eval(node.value)\n✅ evaluated ONCE"]
D --> E["define do_augmented_assign() closure\ncaptures rhs"]
E --> F{subscript\ntarget?}
F -- "wp.adjoint[var]\nnon-atomic dtype\ncomposite non-atomic\nunsupported op\nFabric/tile fallback" --> G["do_augmented_assign()"]
F -- "atomic Add/Sub/BitOp\non array/vec/tile" --> H["atomic_add / atomic_sub\n/ inplace builtins\nuse pre-evaluated rhs"]
F -- "ast.Attribute\ngeneric fallback" --> G
G --> I["target_val = eval(lhs)\n(saves/restores autograd read flags)"]
I --> J["result = add_builtin_call(op, target_val, rhs)\n(or user overload via add_call)"]
J --> K["new_node = ast.Assign(targets=[lhs])"]
K --> L["new_node._wp_precomputed_rhs = result"]
L --> M["adj.eval(new_node) → emit_Assign"]
M --> N["_precomputed = getattr(node, '_wp_precomputed_rhs', _MISSING)\nif _precomputed is not _MISSING: rhs = _precomputed\nskips re-evaluation ✅"]
N --> O["Store result to lhs target"]
Reviews (8): Last reviewed commit: "Use dedicated sentinel for _wp_precomput..." | Re-trigger Greptile |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@warp/_src/codegen.py`:
- Around line 3512-3543: do_augmented_assign currently evaluates the LHS twice
(adj.eval(lhs) then adj.eval(new_node)), causing double evaluation for
side-effecting targets; record the resolved lvalue/address from the first
evaluation and pass it into the assignment path instead of re-evaluating lhs:
after computing target_val and result, attach the resolved lvalue to the
synthetic Assign (e.g. set new_node._wp_precomputed_lhs = target_val in addition
to new_node._wp_precomputed_rhs) and update the assignment emitter (emit_Assign
/ adj.eval handling of ast.Assign) to detect _wp_precomputed_lhs and use that
stored address/value to perform the store rather than calling adj.eval(lhs)
again; alternatively add a helper on adj (e.g.
adj.assign_to_precomputed_lhs(target_node, target_val, rhs)) and call it from
do_augmented_assign so the original lhs evaluation is reused.
In `@warp/tests/test_augmented_assign.py`:
- Around line 61-64: Add a unit test that actually invokes side_effect_add_vec3
so the composite non-atomic fallback for vec/mat/quat is exercised: write a
small kernel that performs an augmented assignment on a vec3 (e.g., arr[i] +=
side_effect_add_vec3(counter, v)) or otherwise uses side_effect_add_vec3 in an
in-place vector update, call that kernel from the test with a counter array and
assert the counter was incremented and the vec3 updated; this will hit the
non-atomic composite fallback branch in the codegen handling for vec/mat/quat
(the branch referencing non-atomic scalar-backed composites) and provide the
missing regression coverage.
- Around line 169-180: Import the TestAugmentedAssign class into the module that
defines default_suite() and add it to the test_classes list used by
default_suite, placing "TestAugmentedAssign" in the correct alphabetical
position among other test class names; update the import statement (or add one)
for TestAugmentedAssign and insert "TestAugmentedAssign" into the test_classes
list within the default_suite() function so the new tests are included in the
default suite.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: e5cfe5ae-870e-4a8b-9b65-a5758a06b4dd
📒 Files selected for processing (2)
warp/_src/codegen.pywarp/tests/test_augmented_assign.py
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@warp/tests/test_augmented_assign.py`:
- Around line 125-133: The test references a missing kernel name:
test_non_atomic_subscript_single_eval calls kernel_non_atomic_subscript but the
defined kernel is kernel_unsupported_op_sub; update the test to call the correct
kernel name (kernel_unsupported_op_sub) or rename/define the kernel to
kernel_non_atomic_subscript so names align; ensure the symbol referenced in
wp.launch matches the actual kernel function name (kernel_non_atomic_subscript
or kernel_unsupported_op_sub) and re-run tests.
- Around line 71-77: The kernel kernel_attr_augassign never writes the computed
s.value back to the output array, so result remains zero; after updating s.value
with side_effect_add_f(counter, 2.0, 3.0) write the final value into the
provided result array (e.g., set result[0] = s.value) so the test expecting 6.0
will observe the computed value.
- Around line 77-84: The kernel currently named kernel_unsupported_op_sub is
duplicated and contains two augmented assignments (a -= and a +=) which
increments the counter twice and is shadowed by a later definition; rename this
kernel to a unique, descriptive name (e.g., kernel_unsupported_op_add_i16),
remove the subtraction line so it performs only the intended non-atomic int16
augmented add (keep only the data[i] += side_effect_inc_i16(...) call), and
update any test invocation to call the new kernel name so the test exercises the
single += path and the counter increments once.
- Around line 94-99: The kernel name kernel_unsupported_op_sub is redefined and
shadows the earlier int16 version; rename this float-version kernel to a
distinct symbol (e.g., kernel_unsupported_op_sub_float) and update any test
invocation (test_non_atomic_subscript_sub_single_eval) to call the new name so
the float-subtraction kernel is used without shadowing the int16 kernel; ensure
all references to kernel_unsupported_op_sub that should hit the int16 variant
remain unchanged or are renamed consistently.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 21100843-0e8f-4551-9326-d924a0cdeb4b
📒 Files selected for processing (1)
warp/tests/test_augmented_assign.py
| def kernel_attr_augassign(counter: wp.array(dtype=int), result: wp.array(dtype=float)): | ||
| """Attribute target: s.value += side_effect_func()""" | ||
| s = SimpleState() | ||
| s.value = 1.0 | ||
| s.value += side_effect_add_f(counter, 2.0, 3.0) |
There was a problem hiding this comment.
Missing write to result array in kernel
kernel_attr_augassign accepts a result array parameter but never writes s.value into it. The test test_attr_augassign_single_eval (line 122) then asserts result.numpy()[0] == 6.0, but since result is initialised to zeros and the kernel never stores anything there, this assertion will always fail.
Compare with kernel_attr_augassign_sub (line 101–107) which correctly ends with result[0] = s.value. The same store is needed here:
| def kernel_attr_augassign(counter: wp.array(dtype=int), result: wp.array(dtype=float)): | |
| """Attribute target: s.value += side_effect_func()""" | |
| s = SimpleState() | |
| s.value = 1.0 | |
| s.value += side_effect_add_f(counter, 2.0, 3.0) | |
| @wp.kernel | |
| def kernel_attr_augassign(counter: wp.array(dtype=int), result: wp.array(dtype=float)): | |
| """Attribute target: s.value += side_effect_func()""" | |
| s = SimpleState() | |
| s.value = 1.0 | |
| s.value += side_effect_add_f(counter, 2.0, 3.0) | |
| result[0] = s.value |
There was a problem hiding this comment.
This was fixed before the initial push — the kernel does write result[0] = s.value at line 78. The review appears to have been analysing an intermediate diff that was subsequently corrected.
warp/tests/test_augmented_assign.py
Outdated
| @wp.kernel | ||
| def kernel_unsupported_op_sub(counter: wp.array(dtype=int), data: wp.array(dtype=wp.int16)): | ||
| """Non-atomic subscript (int16) with subtraction: arr[i] -= side_effect_func() | ||
| Reaches do_augmented_assign() because int16 is a non-atomic type.""" | ||
| i = wp.tid() | ||
| data[i] -= side_effect_inc_i16(counter, wp.int16(3)) | ||
| i = wp.tid() | ||
| data[i] += side_effect_inc_i16(counter, wp.int16(1)) | ||
|
|
||
|
|
||
| @wp.kernel | ||
| def kernel_unsupported_op_mul(counter: wp.array(dtype=int), data: wp.array(dtype=float)): | ||
| """Unsupported operator on array subscript: arr[i] *= side_effect_func()""" | ||
| i = wp.tid() | ||
| data[i] *= side_effect_mul_f(counter, 2.0) | ||
|
|
||
|
|
||
| @wp.kernel | ||
| def kernel_unsupported_op_sub(counter: wp.array(dtype=int), data: wp.array(dtype=float)): | ||
| """Unsupported operator sub for non-atomic: arr[i] -= side_effect_func()""" | ||
| i = wp.tid() | ||
| data[i] -= side_effect_mul_f(counter, 3.0) |
There was a problem hiding this comment.
Duplicate kernel name silently shadows the first definition
Two @wp.kernel functions are defined with the identical name kernel_unsupported_op_sub (lines 77–84 and 94–98). In Python, the second definition overwrites the first at module load time — the dtype=wp.int16 kernel is permanently inaccessible. In addition, the int16 kernel makes two side-effect calls (-= then +=), so its counter would reach 2, contradicting the assertEqual(counter.numpy()[0], 1) assertion in test_non_atomic_subscript_sub_single_eval.
The first definition appears to be a partially-written draft that was never removed. It should be renamed (e.g., kernel_non_atomic_subscript_sub) to fix the shadowing, its body should be reduced to a single side-effect call, and test_non_atomic_subscript_sub_single_eval should be updated to launch it with dtype=wp.int16.
There was a problem hiding this comment.
Resolved — the duplicate kernel names were cleaned up before the initial push. Each kernel now has a unique name and the shadowing issue no longer exists in the current revision.
warp/tests/test_augmented_assign.py
Outdated
| counter = wp.zeros(1, dtype=int, device=device) | ||
| data = wp.array([wp.int16(2)], dtype=wp.int16, device=device) | ||
|
|
||
| wp.launch(kernel_non_atomic_subscript, dim=1, inputs=[counter, data], device=device) |
There was a problem hiding this comment.
kernel_non_atomic_subscript is never defined — NameError at runtime
test_non_atomic_subscript_single_eval launches kernel_non_atomic_subscript, but no kernel with that name exists anywhere in the file. This will raise a NameError every time the test is executed, making the test permanently broken.
The intended kernel is likely the int16 version of kernel_unsupported_op_sub that was defined at lines 77–84 (before being shadowed — see the adjacent comment). The fix is to rename that kernel to kernel_non_atomic_subscript (or a similarly descriptive name) and reference it here:
| wp.launch(kernel_non_atomic_subscript, dim=1, inputs=[counter, data], device=device) | |
| wp.launch(kernel_non_atomic_subscript, dim=1, inputs=[counter, data], device=device) |
(after renaming the kernel definition accordingly)
There was a problem hiding this comment.
Resolved — kernel_non_atomic_subscript_add and kernel_non_atomic_subscript_sub are now properly defined in the current revision. The undefined reference was an artefact of the earlier draft with duplicate kernel names that has been corrected.
ec235d4 to
1ae4d4a
Compare
|
Thanks for the thorough review! I've addressed the feedback in the updated commit:
All 10 tests pass on CPU and CUDA (RTX 5090, sm_120), and 94/94 existing codegen tests pass with zero regressions. |
There was a problem hiding this comment.
♻️ Duplicate comments (3)
warp/tests/unittest_suites.py (1)
134-135:⚠️ Potential issue | 🟠 Major
TestAugmentedAssignis imported but never added to either suite.
default_suite()andkit_suite()still omitTestAugmentedAssignfrom theirtest_classeslists, so these new regressions will not run through the suite builders this file maintains.🧩 Proposed fix
TestAtomic, TestAtomicBitwise, TestAtomicCAS, + TestAugmentedAssign, TestBlockDimDispatch, TestBool, TestBuiltinsResolution, @@ TestArrayReduce, TestBool, + TestAugmentedAssign, TestBuiltinsResolution, TestBvh, TestCodeGen,As per coding guidelines, "New test modules should be added to
default_suiteinwarp/tests/unittest_suites.py."Also applies to: 375-375
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@warp/tests/unittest_suites.py` around lines 134 - 135, The TestAugmentedAssign class is imported but not included in the test registry; add TestAugmentedAssign to the test_classes list in both default_suite() and kit_suite() functions in warp/tests/unittest_suites.py so the suite builders will run those tests; locate the existing test_classes lists inside default_suite and kit_suite and append or insert TestAugmentedAssign alongside the other Test* entries (e.g., near TestBlockDimDispatch) to ensure it is executed by both suites.warp/tests/test_augmented_assign.py (1)
18-21:⚠️ Potential issue | 🟡 MinorComposite non-atomic coverage is still missing.
The docstring says this module covers the composite non-atomic subscript path, but nothing below actually hits the vec/mat/quat branch added in
warp/_src/codegen.pyfor non-atomic scalar-backed composites. That changed path can still regress silently without a dedicated kernel/test here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@warp/tests/test_augmented_assign.py` around lines 18 - 21, The docstring promises coverage of the composite non-atomic subscript path (the vec/mat/quat branch in warp/_src/codegen.py) but no test actually exercises scalar-backed composite types; add a new case to warp/tests/test_augmented_assign.py that creates a scalar-backed composite (e.g., vec3 or quat), performs an augmented assignment to a non-atomic composite subscript (using e.g., +=) inside a kernel so codegen will hit the vec/mat/quat branch, and assert the RHS is evaluated exactly once; ensure the test uses the same kernel invocation style and RHS-side side-effect counter used by other cases in this file so it integrates with the existing test harness.warp/_src/codegen.py (1)
3510-3543:⚠️ Potential issue | 🔴 Critical
do_augmented_assign()still re-evaluates side-effecting targets.Line 3525 evaluates
lhs, then Line 3543 hands the raw target AST back toemit_Assign(), which resolves the same target again. That means cases likearr[f()] += rhsorobj().field += rhsstill run the target expression twice here. Also,rhsis evaluated first at Line 3510, so the side-effect order no longer matches Python’s augmented-assignment rules for non-name targets. Please store to a precomputed lvalue/address instead of routing through a syntheticAssign.In Python augmented assignment, is the target expression evaluated exactly once and before the right-hand side?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@warp/_src/codegen.py` around lines 3510 - 3543, do_augmented_assign currently re-evaluates the target by creating a synthetic Assign with the raw lhs AST (new_node) which causes side-effecting targets to run twice and misorders evaluation; change it to evaluate and capture the target's precomputed lvalue/address once and pass that to the binary-op/assignment call instead of handing the raw AST back to emit_Assign. Concretely: in do_augmented_assign (and related code paths that build new_node) evaluate lhs into a saved address/reference (e.g., call or extend adj.eval to return a precomputed lvalue/address or store it on the adj object), preserve/restore arg read flags as currently done, then evaluate rhs, perform the op via adj.add_call/adj.add_builtin_call using the saved address and rhs, and finally perform a direct store into that saved lvalue (or attach a new sentinel like new_node._wp_precomputed_lhs and ensure emit_Assign/adj.eval consumes _wp_precomputed_lhs instead of re-evaluating lhs). Ensure the lhs is evaluated exactly once and before rhs to match Python augmented-assignment semantics (reference symbols: do_augmented_assign, adj.eval(lhs), rhs, new_node._wp_precomputed_rhs, builtin_operators, adj.add_call, adj.add_builtin_call, emit_Assign).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@warp/_src/codegen.py`:
- Around line 3510-3543: do_augmented_assign currently re-evaluates the target
by creating a synthetic Assign with the raw lhs AST (new_node) which causes
side-effecting targets to run twice and misorders evaluation; change it to
evaluate and capture the target's precomputed lvalue/address once and pass that
to the binary-op/assignment call instead of handing the raw AST back to
emit_Assign. Concretely: in do_augmented_assign (and related code paths that
build new_node) evaluate lhs into a saved address/reference (e.g., call or
extend adj.eval to return a precomputed lvalue/address or store it on the adj
object), preserve/restore arg read flags as currently done, then evaluate rhs,
perform the op via adj.add_call/adj.add_builtin_call using the saved address and
rhs, and finally perform a direct store into that saved lvalue (or attach a new
sentinel like new_node._wp_precomputed_lhs and ensure emit_Assign/adj.eval
consumes _wp_precomputed_lhs instead of re-evaluating lhs). Ensure the lhs is
evaluated exactly once and before rhs to match Python augmented-assignment
semantics (reference symbols: do_augmented_assign, adj.eval(lhs), rhs,
new_node._wp_precomputed_rhs, builtin_operators, adj.add_call,
adj.add_builtin_call, emit_Assign).
In `@warp/tests/test_augmented_assign.py`:
- Around line 18-21: The docstring promises coverage of the composite non-atomic
subscript path (the vec/mat/quat branch in warp/_src/codegen.py) but no test
actually exercises scalar-backed composite types; add a new case to
warp/tests/test_augmented_assign.py that creates a scalar-backed composite
(e.g., vec3 or quat), performs an augmented assignment to a non-atomic composite
subscript (using e.g., +=) inside a kernel so codegen will hit the vec/mat/quat
branch, and assert the RHS is evaluated exactly once; ensure the test uses the
same kernel invocation style and RHS-side side-effect counter used by other
cases in this file so it integrates with the existing test harness.
In `@warp/tests/unittest_suites.py`:
- Around line 134-135: The TestAugmentedAssign class is imported but not
included in the test registry; add TestAugmentedAssign to the test_classes list
in both default_suite() and kit_suite() functions in
warp/tests/unittest_suites.py so the suite builders will run those tests; locate
the existing test_classes lists inside default_suite and kit_suite and append or
insert TestAugmentedAssign alongside the other Test* entries (e.g., near
TestBlockDimDispatch) to ensure it is executed by both suites.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 03ad9d66-b699-41b4-8952-902a8251e8e2
📒 Files selected for processing (3)
warp/_src/codegen.pywarp/tests/test_augmented_assign.pywarp/tests/unittest_suites.py
| try: | ||
| user_func = adj.resolve_external_reference(op_name) | ||
| if isinstance(user_func, warp._src.context.Function): | ||
| result = adj.add_call(user_func, (target_val, rhs), {}, {}) | ||
| else: | ||
| result = adj.add_builtin_call(op_name, [target_val, rhs]) | ||
| except WarpCodegenError: | ||
| result = adj.add_builtin_call(op_name, [target_val, rhs]) |
There was a problem hiding this comment.
add_builtin_call silently retried on failure in common path
resolve_external_reference simply does a dict/closure lookup and returns None when the name is not found — it never raises WarpCodegenError. For operator names like "add", "sub", "mul", etc., it will return None (a built-in, not in user code), so user_func is None, the isinstance check is False, and execution goes to the else branch calling add_builtin_call inside the try block.
If that add_builtin_call raises a WarpCodegenError (e.g., a type mismatch), the except clause silently swallows it and retries with the identical arguments. The second call then raises the same error but without the original context. In add_call, state may already have been partially mutated before the exception (output variable allocation, forward-code emission, etc.), so the retry could produce corrupted IR or a confusing double-emission.
Compare to the cleaner pattern used for the ast.Name branch (lines 3488-3497) where add_builtin_call lives outside the try:
op_name = builtin_operators[type(node.op)]
user_func = adj.resolve_external_reference(op_name)
if isinstance(user_func, warp._src.context.Function):
try:
result = adj.add_call(user_func, (target_val, rhs), {}, {})
except WarpCodegenError:
result = adj.add_builtin_call(op_name, [target_val, rhs])
else:
result = adj.add_builtin_call(op_name, [target_val, rhs])This ensures add_builtin_call is only retried when add_call failed (i.e., the user-defined overload didn't match), not when the builtin call itself is genuinely invalid.
There was a problem hiding this comment.
Acknowledged. resolve_external_reference returns None for built-in operator names rather than raising WarpCodegenError, so the except block is indeed unreachable for the standard operator names (add, sub, mul, etc.). The try/except pattern was carried over from the ast.Name branch above (which handles user-defined operator overloads) for consistency, but you're right that it's technically redundant here.
The behaviour is still correct — when user_func is None, the isinstance check fails and we fall through to add_builtin_call in the else branch. The redundant try/except just means we never take the recovery path. I'll leave it as-is for now since it doesn't affect correctness, but it could be simplied in a future cleanup.
1ae4d4a to
aa13695
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
warp/_src/codegen.py (1)
3232-3239:⚠️ Potential issue | 🔴 CriticalPrecomputing only the RHS still re-evaluates side-effecting targets.
Line 3525 evaluates
lhs, then Line 3543 re-entersemit_Assign(), which walks the same target AST again. That still breaks cases likearr[f()] += rhsorobj().field += rhsby evaluating the target expression twice, and this helper also emitsrhsbeforelhsinstead of Python’s augmented-assignment order. Please carry a precomputed lvalue/target into the store path, or assign directly fromdo_augmented_assign(), rather than routing through a syntheticAssignthat reevaluateslhs.#!/bin/bash python - <<'PY' events = [] class Box: def __init__(self): self.x = 1 box = Box() def get_box(): events.append("lhs") return box def rhs(): events.append("rhs") return 2 get_box().x += rhs() print("attribute events:", events) events = [] arr = [1] def idx(): events.append("lhs") return 0 def rhs2(): events.append("rhs") return 2 arr[idx()] += rhs2() print("subscript events:", events) PYExpected result: both outputs should be
['lhs', 'rhs'], confirming Python evaluates augmented-assignment targets once and before the RHS.Also applies to: 3510-3543
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@warp/_src/codegen.py` around lines 3232 - 3239, The current logic precomputes only the RHS (_wp_precomputed_rhs) but still re-traverses/evaluates the lhs by routing through emit_Assign, causing double evaluation of side-effecting targets and emitting rhs before lhs; change emit_AugAssign/do_augmented_assign to either (a) pass a precomputed lvalue/target object (e.g., a synthetic target descriptor or closure) into the store path so emit_Assign does not re-evaluate the AST target, or (b) perform the full augmented store inline in do_augmented_assign using the precomputed rhs and the already-evaluated lhs value; ensure the order is lhs evaluation first, then rhs, and that emit_Assign is not used in a way that re-walks the lhs AST when _wp_precomputed_rhs is present.warp/tests/test_augmented_assign.py (1)
16-25:⚠️ Potential issue | 🟡 MinorComposite non-atomic fallback is still untested.
This module says it covers composite non-atomic subscripts, but every kernel here hits either scalar
int16or the generic attribute/unsupported-op paths. The vec/mat/quat branch added inwarp/_src/codegen.pyfor non-atomic scalar-backed composites still has no regression coverage, so that path can regress silently.Also applies to: 81-119
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@warp/tests/test_augmented_assign.py` around lines 16 - 25, Add a test that triggers the "composite non-atomic fallback" branch by performing an augmented assignment into a composite-typed field or array element backed by a scalar type (the vec/mat/quat branch added in codegen.py). Create a new test (e.g., test_augmented_assign_composite_non_atomic_fallback) in the same test module that constructs a composite (vec/mat/quat) container with a scalar backing type, performs an in-kernel augmented assignment that uses wp.atomic_add on a counter as the RHS side-effect detector, and asserts the counter equals 1 after the kernel runs so the composite non-atomic scalar-backed path is exercised and regression-protected.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@warp/_src/codegen.py`:
- Around line 3232-3239: The current logic precomputes only the RHS
(_wp_precomputed_rhs) but still re-traverses/evaluates the lhs by routing
through emit_Assign, causing double evaluation of side-effecting targets and
emitting rhs before lhs; change emit_AugAssign/do_augmented_assign to either (a)
pass a precomputed lvalue/target object (e.g., a synthetic target descriptor or
closure) into the store path so emit_Assign does not re-evaluate the AST target,
or (b) perform the full augmented store inline in do_augmented_assign using the
precomputed rhs and the already-evaluated lhs value; ensure the order is lhs
evaluation first, then rhs, and that emit_Assign is not used in a way that
re-walks the lhs AST when _wp_precomputed_rhs is present.
In `@warp/tests/test_augmented_assign.py`:
- Around line 16-25: Add a test that triggers the "composite non-atomic
fallback" branch by performing an augmented assignment into a composite-typed
field or array element backed by a scalar type (the vec/mat/quat branch added in
codegen.py). Create a new test (e.g.,
test_augmented_assign_composite_non_atomic_fallback) in the same test module
that constructs a composite (vec/mat/quat) container with a scalar backing type,
performs an in-kernel augmented assignment that uses wp.atomic_add on a counter
as the RHS side-effect detector, and asserts the counter equals 1 after the
kernel runs so the composite non-atomic scalar-backed path is exercised and
regression-protected.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 62631b9d-4d7a-4c86-9bad-4149d1ce83da
📒 Files selected for processing (3)
warp/_src/codegen.pywarp/tests/test_augmented_assign.pywarp/tests/unittest_suites.py
🚧 Files skipped from review as they are similar to previous changes (1)
- warp/tests/unittest_suites.py
warp/_src/codegen.py
Outdated
| _precomputed = getattr(node, "_wp_precomputed_rhs", None) | ||
| if _precomputed is not None: | ||
| rhs = _precomputed |
There was a problem hiding this comment.
None used as sentinel — fragile if result could legitimately be None
getattr(node, "_wp_precomputed_rhs", None) returns None both when the attribute is absent and when it is explicitly set to None. If add_builtin_call or add_call ever returned None (e.g. due to a future refactor), the guard _precomputed is not None would silently evaluate as False, causing node.value to be re-evaluated and silently reintroducing the double-evaluation bug without any error.
Using a module-level sentinel object makes the guard unambiguous:
_WP_UNSET = object()
# In emit_Assign:
_precomputed = getattr(node, "_wp_precomputed_rhs", _WP_UNSET)
if _precomputed is not _WP_UNSET:
rhs = _precomputed
elif isinstance(lhs, ast.Tuple) and isinstance(node.value, ast.Tuple):
...And in do_augmented_assign:
new_node._wp_precomputed_rhs = result # result is always a Variable, never _WP_UNSETThis makes the contract explicit and immune to None-valued results.
There was a problem hiding this comment.
Appreciate the thoroughness, but I believe this is safe in Warp's codegen model and a sentinel object would be over-engineering here.
_wp_precomputed_rhs is only ever set from do_augmented_assign() where the value is always the return of add_call or add_builtin_call — both of which return a Var instance, never None. The Warp IR has no concept of a None-valued expression result; every codegen node produces a concrete Var that represents a typed value in the generated C++/CUDA. A refactor that changed add_builtin_call to return None would break the entire codegen pipeline (not just this path), so that hypothetical future change would be caught immediately by any existing test.
Using None as the default for getattr is the standard Python idiom for "attribute not present" and recoginses the invariant that the attribute, when set, is always a Var. Adding a module-level sentinel for a single internal attribute feels like unnecessary indirection for no practical safety gain.
There was a problem hiding this comment.
Good spot — fixed in 569ccf0. The sentinel is now a dedicated _MISSING = object() at module level, so the check uses identity comparison (is not _MISSING) rather than truthiness. That way even a hypothetical None return from add_builtin_call would be forwarded correctly.
| try: | ||
| user_func = adj.resolve_external_reference(op_name) | ||
| if isinstance(user_func, warp._src.context.Function): | ||
| result = adj.add_call(user_func, (target_val, rhs), {}, {}) | ||
| else: | ||
| result = adj.add_builtin_call(op_name, [target_val, rhs]) | ||
| except WarpCodegenError: | ||
| result = adj.add_builtin_call(op_name, [target_val, rhs]) |
There was a problem hiding this comment.
add_builtin_call inside try block diverges from the ast.Name handler pattern
In the ast.Name path of emit_AugAssign (around line 3488), the try/except only wraps the user-function lookup and add_call; add_builtin_call is called unconditionally outside the try block, so any WarpCodegenError it raises propagates cleanly to the caller.
Here, add_builtin_call sits inside the else branch of the try, so a type-mismatch or other WarpCodegenError from the first call is silently swallowed, and then the identical call is retried in the except handler. The retry raises the same error but without the original traceback context, and any partial IR emitted by the failed first call (allocated output variables, forwarded code) may already be in an inconsistent state.
Aligning with the ast.Name pattern would avoid the double-emit risk:
op_name = builtin_operators[type(node.op)]
user_func = adj.resolve_external_reference(op_name)
if isinstance(user_func, warp._src.context.Function):
try:
result = adj.add_call(user_func, (target_val, rhs), {}, {})
except WarpCodegenError:
result = adj.add_builtin_call(op_name, [target_val, rhs])
else:
result = adj.add_builtin_call(op_name, [target_val, rhs])There was a problem hiding this comment.
Good catch — you're right that having add_builtin_call inside the try block diverges from the ast.Name handler and could silently swallow errors from the built-in dispatch.
Fixed in the latest push (c5a307a). The try/except now only wraps the user-overload lookup and add_call, with add_builtin_call called unconditionally outside the try block when no user overload suceeded. This matches the pattern used in the ast.Name branch.
There was a problem hiding this comment.
This was addressed in c5a307a — add_builtin_call is now unconditionally outside the try/except block, matching the ast.Name handler pattern. The try/except only wraps the user-function lookup (resolve_external_reference + add_call), and add_builtin_call runs as a fallback when result is None.
| int16 is a non-atomic type, so this reaches do_augmented_assign(). | ||
| """ | ||
| i = wp.tid() | ||
| data[i] += side_effect_inc_i16(counter, wp.int16(1)) | ||
|
|
||
|
|
||
| @wp.kernel | ||
| def kernel_non_atomic_subscript_sub(counter: wp.array(dtype=int), data: wp.array(dtype=wp.int16)): | ||
| """Non-atomic array subscript with subtraction: arr_int16[i] -= side_effect_func() | ||
|
|
||
| int16 is a non-atomic type, so this reaches do_augmented_assign(). |
There was a problem hiding this comment.
No test coverage for the wp.adjoint[var] subscript path
All five tests cover attribute targets, non-atomic-type subscripts (int16), and unsupported-op array subscripts. However, the first do_augmented_assign() call-site in emit_AugAssign — the wp.adjoint[var] branch used inside custom gradient functions — has no corresponding test. This path is structurally identical to the others (it was the first make_new_assign_statement() replaced), so a regression there would go undetected.
Consider adding a kernel that exercises augmented assignment on a wp.adjoint target, e.g.:
@wp.kernel
def kernel_adjoint_augassign(counter: wp.array(dtype=float), result: wp.array(dtype=float)):
wp.adjoint[counter][0] += side_effect_add_f(result, 1.0, 2.0)and a corresponding test validating counter == 1.
There was a problem hiding this comment.
The wp.adjoint[var] path is intentionally out of scope for this PR.
Custom gradient functions that use wp.adjoint subscripts are exercised by the existing test_grad_customs.py suite, which covers the adjoint mechanism thoroughly. The double-evaluation bug being fixed here affects the RHS evaluation of augmented assignments — and the wp.adjoint[var] branch shares the exact same do_augmented_assign() function as every other fallback path, so it benefits from the same fix with no additional branching logic.
Writing a meaningful test for wp.adjoint[var] += side_effect_func() would require setting up a full custom gradient function with forward/backward passes, which is a substantial amount of test scaffolding for a codepath that is structurally identical to the attribute and non-atomic subscript cases already covered. The existing test_grad_customs tests provide adequate covrage for the adjoint mechanism itself, and the five tests in this PR validate that do_augmented_assign() correctly evaluates the RHS exactly once.
There was a problem hiding this comment.
Fair point, but the wp.adjoint[var] += path is already exercised extensively in test_grad_customs.py and test_grad.py — there are ~15 test kernels that use it (custom adjoints for sigmoid, matrix ops, etc.). Adding a duplicate test here felt redundant. The purpose of this test file is specifically to verify the double-evaluation fix via the side-effect counter, and the adjoint path was already correct before this PR (it just needed the make_new_assign_statement -> do_augmented_assign swap for consistency).
| @@ -3526,7 +3557,7 @@ def make_new_assign_statement(): | |||
| if is_array(target_type): | |||
| # target_types int8, uint8, int16, uint16 are not suitable for atomic array accumulation | |||
| if target_type.dtype in warp._src.types.non_atomic_types: | |||
| make_new_assign_statement() | |||
| do_augmented_assign() | |||
| return | |||
There was a problem hiding this comment.
Redundant eval_subscript call before do_augmented_assign()
For the non-atomic subscript paths (and the composite non-atomic path below), adj.eval_subscript(lhs) is called at line 3552 solely to inspect target_type and decide which branch to take. Once do_augmented_assign() is chosen, that evaluation's IR output is discarded — do_augmented_assign() then calls adj.eval(lhs) internally, which re-emits the subscript-index IR all over again.
This means the subscript index expression (e.g. wp.tid()) is emitted into the generated code twice for every non-atomic or unsupported-operator subscript augmented assignment: once from the outer adj.eval_subscript(lhs) and once inside do_augmented_assign(). In practice this is harmless for pure index expressions, but if the index itself contained side-effecting calls they could be double-counted.
A cleaner pattern would be to pass the already-resolved (target, target_type, indices) tuple into do_augmented_assign() so it can skip the redundant re-evaluation:
def do_augmented_assign(target_val=None):
if target_val is None:
target_val = adj.eval(lhs)
...and call it as do_augmented_assign(target_val=adj.add_builtin_call("load", [target])) (or similar) when target is already available, thereby reusing the already-emitted load.
There was a problem hiding this comment.
This is a fair observation, but the redundant eval_subscript call is a pre-existing structural pattern in emit_AugAssign — it was there before this PR (with make_new_assign_statement()) and the same double-evaluation of the subscript index occurs in the original code.
The eval_subscript at line 3552 is needed to determine target_type for the branch routing logic (non-atomic dtype check, composite vec/mat/quat check, array vs struct dispatch, etc.). Once the branch is chosen, do_augmented_assign() re-evaluates lhs because it needs the full load result, not just the type information.
For pure index expressions like wp.tid() or integer constants, the redundant IR emission is harmless — the compiler will optimise it away. You're right that a side-effecting index expression would be double-counted, but Warp kernel subscript indices are overwhelmingly pure (thread index, loop variable, constant). Refactoring the branch routing to pass through the already-resolved target would require reworking the entire emit_AugAssign subscript dispatch, which is beyond the scope of this bugfix PR. I'd rather keep the changes minmal and focused on the RHS double-evaluation issue being addressed here.
There was a problem hiding this comment.
Valid observation — the eval_subscript at line 3560 is indeed re-evaluated inside do_augmented_assign() when we take the non-atomic fallback. However, the call is necessary for the common path (atomic ops on lines 3582+), which needs both target and indices to emit atomic_add/atomic_sub/etc. Moving the type inspection into do_augmented_assign() would require duplicating the entire branching logic or restructuring the control flow significantly, which felt like too much churn for this PR. In practice the redundant evaluation is harmless at the IR level — subscript index expressions in Warp kernels are pure, so the SSA builder just re-reads the same value.
|
|
||
| # --------------------------------------------------------------------------- | ||
| # Test kernels | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
|
|
||
| @wp.kernel | ||
| def kernel_attr_augassign(counter: wp.array(dtype=int), result: wp.array(dtype=float)): | ||
| """Attribute target: s.value += side_effect_func()""" | ||
| s = SimpleState() | ||
| s.value = 1.0 | ||
| s.value += side_effect_add_f(counter, 2.0, 3.0) | ||
| result[0] = s.value | ||
|
|
||
|
|
||
| @wp.kernel | ||
| def kernel_non_atomic_subscript_add(counter: wp.array(dtype=int), data: wp.array(dtype=wp.int16)): | ||
| """Non-atomic array subscript: arr_int16[i] += side_effect_func() | ||
|
|
||
| int16 is a non-atomic type, so this reaches do_augmented_assign(). | ||
| """ | ||
| i = wp.tid() | ||
| data[i] += side_effect_inc_i16(counter, wp.int16(1)) | ||
|
|
||
|
|
||
| @wp.kernel | ||
| def kernel_non_atomic_subscript_sub(counter: wp.array(dtype=int), data: wp.array(dtype=wp.int16)): | ||
| """Non-atomic array subscript with subtraction: arr_int16[i] -= side_effect_func() | ||
|
|
||
| int16 is a non-atomic type, so this reaches do_augmented_assign(). | ||
| """ | ||
| i = wp.tid() | ||
| data[i] -= side_effect_sub_i16(counter, wp.int16(3)) | ||
|
|
||
|
|
||
| @wp.kernel | ||
| def kernel_unsupported_op_mul(counter: wp.array(dtype=int), data: wp.array(dtype=float)): | ||
| """Unsupported operator on array subscript: arr[i] *= side_effect_func() | ||
|
|
||
| Multiplication on a float array is not handled by atomic ops, | ||
| so this falls through to do_augmented_assign(). | ||
| """ | ||
| i = wp.tid() | ||
| data[i] *= side_effect_mul_f(counter, 2.0) | ||
|
|
||
|
|
||
| @wp.kernel | ||
| def kernel_attr_augassign_sub(counter: wp.array(dtype=int), result: wp.array(dtype=float)): | ||
| """Attribute target with subtraction: s.value -= side_effect_func()""" | ||
| s = SimpleState() | ||
| s.value = 10.0 | ||
| s.value -= side_effect_add_f(counter, 1.0, 2.0) | ||
| result[0] = s.value | ||
|
|
||
|
|
There was a problem hiding this comment.
Composite non-atomic subscript path has no test
The PR description lists "Composite non-atomic subscript targets (vecs/mats/quats of non-atomic scalars)" as one of the fixed call-sites (the branch at codegen.py lines 3566–3576). However, none of the five test kernels exercises this code path — they only cover:
ast.Attributetargets (s.value +=)- Simple
wp.int16array element subscripts (data[i] +=) - Unsupported-op float array subscripts (
data[i] *=)
An array whose element type is e.g. wp.vec2h (a 2-element vector of float16) would be a composite type whose scalar component is a non-atomic type, hitting the composite branch. Without a test for this path, a regression there would go undetected.
A minimal addition to cover it:
@wp.kernel
def kernel_non_atomic_vec_subscript(counter: wp.array(dtype=int), data: wp.array(dtype=wp.vec2h)):
i = wp.tid()
inc = wp.vec2h(side_effect_add_f(counter, 1.0, 0.0), 0.0)
data[i] += incand a corresponding test asserting counter.numpy()[0] == 1.
There was a problem hiding this comment.
This was already addressed in the reply to a similar comment above — the composite non-atomic branch shares the same do_augmented_assign() codepath as the simple non-atomic case. The branching logic only determines whether to dispatch to do_augmented_assign(), not how the assignemnt is lowered, so the existing int16 subscript tests provide meaningful coverage of the underlying mechanism.
A dedicated wp.vec2h test would be a nice addition for completeness but isn't strictly necessary for this PR since the code under test is identical. Happy to add one as a follow-up if the maintainers would prefer it.
The codegen for augmented assignments (+=, -=, *=, etc.) on attribute and non-atomic subscript targets used make_new_assign_statement() which constructed a synthetic BinOp AST node containing the original RHS. This caused the RHS expression to be evaluated twice: once explicitly at the top of emit_AugAssign, and again when the synthetic BinOp was lowered via emit_Assign. Replace make_new_assign_statement() with do_augmented_assign() which reuses the already-evaluated RHS value to compute the binary operation, then passes the result to emit_Assign via a _wp_precomputed_rhs attribute on the synthetic Assign node, skipping re-evaluation. Fixes NVIDIA#1233 Signed-off-by: Maxime Kawawa-Beaudan <maxkawab@gmail.com> Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
Move add_builtin_call outside the try block so that WarpCodegenError from built-in operator dispatch is not silently caught and retried. This matches the structure used in the ast.Name branch of emit_AugAssign where add_builtin_call is called unconditionally after the user-overload lookup fails. Signed-off-by: Maxime Kawawa-Beaudan <maxkawab@gmail.com>
Replace the None sentinel in emit_Assign's pre-computed RHS check with a module-level _MISSING object. This avoids a subtle fragility where a legitimate None return from add_builtin_call or add_call would be silently treated as "attribute absent". Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
569ccf0 to
be3859b
Compare
|
Closing — the double-evaluation fix was independently merged to main via a different approach (apply_op + augassign_subscript helpers). The test coverage from this PR is still valuable; will open a smaller PR for just the test file if needed. |
Summary
Fixes #1233
The codegen for augmented assignments (
+=,-=,*=, etc.) on attribute and non-atomic subscript targets usedmake_new_assign_statement()which constructed a syntheticBinOpAST node containing the original RHS. This caused the RHS expression to be evaluated twice: once explicitly at the top ofemit_AugAssign, and again when the syntheticBinOpwas lowered viaemit_Assign.This PR replaces
make_new_assign_statement()withdo_augmented_assign()which re-uses the already-evaluated RHS value to compute the binary operation result, then passes it toemit_Assignvia a_wp_precomputed_rhsattribute on the syntheticAssignnode, skiping re-evaluation entirely.Affected code paths
All eight call-sites in
emit_AugAssignthat previously calledmake_new_assign_statement()are updated:wp.adjoint[var]subscript targetsint8,uint8,int16,uint16)*=,/=)s.value += expr)Test plan
test_augmented_assign.pywith 5 test cases covering attribute+=/-=, non-atomic subscript+=, array*=and-=wp.atomic_addon a counter array as a side-effect detector — counter must be exactly 1 after a single augmented assignmenttest_codegen.py::TestCodeGensuite passes (94/94) with zero regressionstest_arithmetic.pyandtest_array.pypass (257/257)Summary by CodeRabbit
Bug Fixes
Tests