Skip to content

Fix augmented assignment double-evaluation of RHS expressions#1282

Closed
cluster2600 wants to merge 3 commits intoNVIDIA:mainfrom
cluster2600:fix/augassign-double-eval
Closed

Fix augmented assignment double-evaluation of RHS expressions#1282
cluster2600 wants to merge 3 commits intoNVIDIA:mainfrom
cluster2600:fix/augassign-double-eval

Conversation

@cluster2600
Copy link
Copy Markdown

@cluster2600 cluster2600 commented Mar 10, 2026

Summary

Fixes #1233

The codegen for augmented assignments (+=, -=, *=, etc.) on attribute and non-atomic subscript targets used make_new_assign_statement() which constructed a synthetic BinOp AST node containing the original RHS. This caused the RHS expression to be evaluated twice: once explicitly at the top of emit_AugAssign, and again when the synthetic BinOp was lowered via emit_Assign.

This PR replaces make_new_assign_statement() with do_augmented_assign() which re-uses the already-evaluated RHS value to compute the binary operation result, then passes it to emit_Assign via a _wp_precomputed_rhs attribute on the synthetic Assign node, skiping re-evaluation entirely.

Affected code paths

All eight call-sites in emit_AugAssign that previously called make_new_assign_statement() are updated:

  • wp.adjoint[var] subscript targets
  • Non-atomic array subscript targets (int8, uint8, int16, uint16)
  • Composite non-atomic subscript targets (vecs/mats/quats of non-atomic scalars)
  • Unsupported operators on array subscripts (e.g. *=, /=)
  • Unsupported operators on FabricArray / indexedarray subscripts
  • Unsupported operators on tile subscripts
  • Attribute targets (e.g. s.value += expr)
  • Generic fallback path

Test plan

  • Added test_augmented_assign.py with 5 test cases covering attribute +=/-=, non-atomic subscript +=, array *= and -=
  • Each test uses wp.atomic_add on a counter array as a side-effect detector — counter must be exactly 1 after a single augmented assignment
  • All tests pass on both CPU and CUDA (RTX 5090, sm_120)
  • Existing test_codegen.py::TestCodeGen suite passes (94/94) with zero regressions
  • test_arithmetic.py and test_array.py pass (257/257)

Summary by CodeRabbit

  • Bug Fixes

    • Augmented-assignment now guarantees RHS expressions are evaluated exactly once for attribute, subscript, in-place and related targets, avoiding duplicate side effects while preserving autograd/read-state.
  • Tests

    • Added comprehensive tests that validate single-evaluation semantics for augmented assignments across multiple target kinds and side-effecting expressions.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 10, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 10, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Propagate a precomputed RHS (_wp_precomputed_rhs) and centralize augmented-assignment logic in do_augmented_assign to ensure RHS is evaluated exactly once across name/attribute/subscript/tile/vector/matrix/quat and generic targets; add unit tests validating single-evaluation semantics and register them in test suites.

Changes

Cohort / File(s) Summary
Codegen augmented-assignment fix
warp/_src/codegen.py
Add _wp_precomputed_rhs handling and new do_augmented_assign() to compute augmented-assignment results from an already-evaluated RHS; replace prior make_new_assign_statement() call sites so LHS evaluation no longer causes double-evaluation of RHS across name/attribute/subscript/tile/vector/matrix/quat and generic targets, preserving autograd/read-state semantics.
Augmented-assignment unit tests
warp/tests/test_augmented_assign.py
New public test module adding SimpleState, side-effect helper funcs, kernels for attribute/non-atomic/unsupported-op subscript paths, and tests asserting RHS is evaluated exactly once for multiple augmented-assignment scenarios; registers tests across devices.
Test suite registration
warp/tests/unittest_suites.py
Import and include TestAugmentedAssign in default_suite and kit_suite so the new tests are run with existing unittest suites.

Sequence Diagram(s)

sequenceDiagram
    participant AST as AST (AugAssign node)
    participant Adj as Adj/Evaluator
    participant DoAug as do_augmented_assign
    participant Assign as assign_to_target / storage

    AST->>Adj: request evaluate RHS (node.value) -> produce _wp_precomputed_rhs
    Adj->>DoAug: call do_augmented_assign(lhs, _wp_precomputed_rhs, op)
    DoAug->>Adj: evaluate/resolve LHS (preserve autograd/read state)
    DoAug->>Adj: compute result = lhs OP _wp_precomputed_rhs (builtin or overload)
    DoAug->>Assign: assign result to target (uses synthetic new_assign._wp_precomputed_rhs)
    Assign->>Adj: complete store (write-back)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 65.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title precisely summarizes the main change: fixing double-evaluation of RHS expressions in augmented assignments, which is the core issue resolved by this PR.
Linked Issues check ✅ Passed The PR fully addresses issue #1233 by replacing make_new_assign_statement() with do_augmented_assign(), ensuring RHS is evaluated only once across all target types (attribute, subscript, adjoint) via _wp_precomputed_rhs mechanism.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing double-evaluation: codegen.py updates implement the fix, test_augmented_assign.py validates the fix, and unittest_suites.py registers the tests appropriately.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 10, 2026

Greptile Summary

This PR fixes a long-standing double-evaluation bug (#1233) in augmented-assignment codegen (+=, -=, *=, etc.) for attribute and non-ast.Name subscript targets. The old make_new_assign_statement() helper built a synthetic BinOp containing the raw RHS AST node and passed it to emit_Assign, which caused adj.eval(node.value) to run a second time — doubling any side effects in the RHS expression. The fix introduces do_augmented_assign(), a closure that captures the already-evaluated rhs variable, computes the binary-op result directly, then forwards it to emit_Assign via the _wp_precomputed_rhs attribute on the synthetic Assign node. A module-level _MISSING sentinel is used so the check in emit_Assign is an identity comparison rather than a truthiness check, correctly handling any future case where the precomputed value might be falsy.

Key changes:

  • _MISSING = object() sentinel added at module level; emit_Assign checks is not _MISSING before using _wp_precomputed_rhs.
  • do_augmented_assign() replaces all eight make_new_assign_statement() call-sites in emit_AugAssign, covering: wp.adjoint[var], non-atomic array subscripts, composite non-atomic subscripts, unsupported-op array/FabricArray/tile subscripts, attribute targets, and the generic fallback.
  • Autograd read-state flags are saved/restored around the LHS read inside do_augmented_assign(), matching the pattern used in emit_BinOp.
  • The try/except that guards user-overload lookup correctly wraps only resolve_external_reference + add_call; add_builtin_call is unconditionally outside the block, consistent with the ast.Name handler pattern.
  • Five new tests validate single-evaluation semantics using wp.atomic_add side-effect counters, covering attribute +=/-=, non-atomic int16 subscript +=/-=, and unsupported-op float array *=.
  • TestAugmentedAssign is correctly registered in both default_suite and debug_suite.

One minor nit: the SPDX copyright year in the new test file reads 2022 instead of the current year.

Confidence Score: 4/5

  • PR is safe to merge; the core fix is correct and all previously raised blocking concerns have been resolved.
  • The double-evaluation fix is mechanically sound: RHS is evaluated once before the closure is defined, the _MISSING sentinel prevents false-positive None matches, and add_builtin_call correctly sits outside the try/except. Tests pass on both CPU and CUDA. Score held at 4 rather than 5 because composite non-atomic subscript targets (e.g. wp.vec2h arrays) and the wp.adjoint[var] path remain untested in this PR — acknowledged by the author as intentionally deferred follow-ups, but they are structurally distinct code paths that a regression could silently affect.
  • No files require special attention; all changes are contained in warp/_src/codegen.py and the new warp/tests/test_augmented_assign.py.

Important Files Changed

Filename Overview
warp/_src/codegen.py Introduces do_augmented_assign() closure and _wp_precomputed_rhs mechanism to prevent RHS double-evaluation. _MISSING sentinel is correctly defined at module level. The try/except now correctly wraps only the user-overload lookup, with add_builtin_call outside. Autograd read-state save/restore is appropriately scoped around the LHS read.
warp/tests/test_augmented_assign.py New test file covering attribute +=/-=, non-atomic int16 subscript +=/-=, and unsupported-op float array *= via atomic side-effect counters. Tests are logically sound with correct expected values. Copyright year (2022) in the header is stale for a file created in this PR.
warp/tests/unittest_suites.py Registers TestAugmentedAssign in both default_suite and debug_suite; alphabetical insertion is correct.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["emit_AugAssign(node)"] --> B{lhs type?}
    B -- "ast.Name" --> C["eval(rhs) once\nadd_builtin_call / add_call\nupdate symbols\nreturn"]
    B -- "ast.Subscript\nast.Attribute\nfallback" --> D["rhs = adj.eval(node.value)\n✅ evaluated ONCE"]
    D --> E["define do_augmented_assign() closure\ncaptures rhs"]
    E --> F{subscript\ntarget?}
    F -- "wp.adjoint[var]\nnon-atomic dtype\ncomposite non-atomic\nunsupported op\nFabric/tile fallback" --> G["do_augmented_assign()"]
    F -- "atomic Add/Sub/BitOp\non array/vec/tile" --> H["atomic_add / atomic_sub\n/ inplace builtins\nuse pre-evaluated rhs"]
    F -- "ast.Attribute\ngeneric fallback" --> G
    G --> I["target_val = eval(lhs)\n(saves/restores autograd read flags)"]
    I --> J["result = add_builtin_call(op, target_val, rhs)\n(or user overload via add_call)"]
    J --> K["new_node = ast.Assign(targets=[lhs])"]
    K --> L["new_node._wp_precomputed_rhs = result"]
    L --> M["adj.eval(new_node) → emit_Assign"]
    M --> N["_precomputed = getattr(node, '_wp_precomputed_rhs', _MISSING)\nif _precomputed is not _MISSING: rhs = _precomputed\nskips re-evaluation ✅"]
    N --> O["Store result to lhs target"]
Loading

Reviews (8): Last reviewed commit: "Use dedicated sentinel for _wp_precomput..." | Re-trigger Greptile

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@warp/_src/codegen.py`:
- Around line 3512-3543: do_augmented_assign currently evaluates the LHS twice
(adj.eval(lhs) then adj.eval(new_node)), causing double evaluation for
side-effecting targets; record the resolved lvalue/address from the first
evaluation and pass it into the assignment path instead of re-evaluating lhs:
after computing target_val and result, attach the resolved lvalue to the
synthetic Assign (e.g. set new_node._wp_precomputed_lhs = target_val in addition
to new_node._wp_precomputed_rhs) and update the assignment emitter (emit_Assign
/ adj.eval handling of ast.Assign) to detect _wp_precomputed_lhs and use that
stored address/value to perform the store rather than calling adj.eval(lhs)
again; alternatively add a helper on adj (e.g.
adj.assign_to_precomputed_lhs(target_node, target_val, rhs)) and call it from
do_augmented_assign so the original lhs evaluation is reused.

In `@warp/tests/test_augmented_assign.py`:
- Around line 61-64: Add a unit test that actually invokes side_effect_add_vec3
so the composite non-atomic fallback for vec/mat/quat is exercised: write a
small kernel that performs an augmented assignment on a vec3 (e.g., arr[i] +=
side_effect_add_vec3(counter, v)) or otherwise uses side_effect_add_vec3 in an
in-place vector update, call that kernel from the test with a counter array and
assert the counter was incremented and the vec3 updated; this will hit the
non-atomic composite fallback branch in the codegen handling for vec/mat/quat
(the branch referencing non-atomic scalar-backed composites) and provide the
missing regression coverage.
- Around line 169-180: Import the TestAugmentedAssign class into the module that
defines default_suite() and add it to the test_classes list used by
default_suite, placing "TestAugmentedAssign" in the correct alphabetical
position among other test class names; update the import statement (or add one)
for TestAugmentedAssign and insert "TestAugmentedAssign" into the test_classes
list within the default_suite() function so the new tests are included in the
default suite.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: e5cfe5ae-870e-4a8b-9b65-a5758a06b4dd

📥 Commits

Reviewing files that changed from the base of the PR and between 3af8dfa and 2f32f32.

📒 Files selected for processing (2)
  • warp/_src/codegen.py
  • warp/tests/test_augmented_assign.py

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@warp/tests/test_augmented_assign.py`:
- Around line 125-133: The test references a missing kernel name:
test_non_atomic_subscript_single_eval calls kernel_non_atomic_subscript but the
defined kernel is kernel_unsupported_op_sub; update the test to call the correct
kernel name (kernel_unsupported_op_sub) or rename/define the kernel to
kernel_non_atomic_subscript so names align; ensure the symbol referenced in
wp.launch matches the actual kernel function name (kernel_non_atomic_subscript
or kernel_unsupported_op_sub) and re-run tests.
- Around line 71-77: The kernel kernel_attr_augassign never writes the computed
s.value back to the output array, so result remains zero; after updating s.value
with side_effect_add_f(counter, 2.0, 3.0) write the final value into the
provided result array (e.g., set result[0] = s.value) so the test expecting 6.0
will observe the computed value.
- Around line 77-84: The kernel currently named kernel_unsupported_op_sub is
duplicated and contains two augmented assignments (a -= and a +=) which
increments the counter twice and is shadowed by a later definition; rename this
kernel to a unique, descriptive name (e.g., kernel_unsupported_op_add_i16),
remove the subtraction line so it performs only the intended non-atomic int16
augmented add (keep only the data[i] += side_effect_inc_i16(...) call), and
update any test invocation to call the new kernel name so the test exercises the
single += path and the counter increments once.
- Around line 94-99: The kernel name kernel_unsupported_op_sub is redefined and
shadows the earlier int16 version; rename this float-version kernel to a
distinct symbol (e.g., kernel_unsupported_op_sub_float) and update any test
invocation (test_non_atomic_subscript_sub_single_eval) to call the new name so
the float-subtraction kernel is used without shadowing the int16 kernel; ensure
all references to kernel_unsupported_op_sub that should hit the int16 variant
remain unchanged or are renamed consistently.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 21100843-0e8f-4551-9326-d924a0cdeb4b

📥 Commits

Reviewing files that changed from the base of the PR and between 2f32f32 and ec235d4.

📒 Files selected for processing (1)
  • warp/tests/test_augmented_assign.py

Comment on lines +72 to +76
def kernel_attr_augassign(counter: wp.array(dtype=int), result: wp.array(dtype=float)):
"""Attribute target: s.value += side_effect_func()"""
s = SimpleState()
s.value = 1.0
s.value += side_effect_add_f(counter, 2.0, 3.0)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing write to result array in kernel

kernel_attr_augassign accepts a result array parameter but never writes s.value into it. The test test_attr_augassign_single_eval (line 122) then asserts result.numpy()[0] == 6.0, but since result is initialised to zeros and the kernel never stores anything there, this assertion will always fail.

Compare with kernel_attr_augassign_sub (line 101–107) which correctly ends with result[0] = s.value. The same store is needed here:

Suggested change
def kernel_attr_augassign(counter: wp.array(dtype=int), result: wp.array(dtype=float)):
"""Attribute target: s.value += side_effect_func()"""
s = SimpleState()
s.value = 1.0
s.value += side_effect_add_f(counter, 2.0, 3.0)
@wp.kernel
def kernel_attr_augassign(counter: wp.array(dtype=int), result: wp.array(dtype=float)):
"""Attribute target: s.value += side_effect_func()"""
s = SimpleState()
s.value = 1.0
s.value += side_effect_add_f(counter, 2.0, 3.0)
result[0] = s.value

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was fixed before the initial push — the kernel does write result[0] = s.value at line 78. The review appears to have been analysing an intermediate diff that was subsequently corrected.

Comment on lines +77 to +98
@wp.kernel
def kernel_unsupported_op_sub(counter: wp.array(dtype=int), data: wp.array(dtype=wp.int16)):
"""Non-atomic subscript (int16) with subtraction: arr[i] -= side_effect_func()
Reaches do_augmented_assign() because int16 is a non-atomic type."""
i = wp.tid()
data[i] -= side_effect_inc_i16(counter, wp.int16(3))
i = wp.tid()
data[i] += side_effect_inc_i16(counter, wp.int16(1))


@wp.kernel
def kernel_unsupported_op_mul(counter: wp.array(dtype=int), data: wp.array(dtype=float)):
"""Unsupported operator on array subscript: arr[i] *= side_effect_func()"""
i = wp.tid()
data[i] *= side_effect_mul_f(counter, 2.0)


@wp.kernel
def kernel_unsupported_op_sub(counter: wp.array(dtype=int), data: wp.array(dtype=float)):
"""Unsupported operator sub for non-atomic: arr[i] -= side_effect_func()"""
i = wp.tid()
data[i] -= side_effect_mul_f(counter, 3.0)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate kernel name silently shadows the first definition

Two @wp.kernel functions are defined with the identical name kernel_unsupported_op_sub (lines 77–84 and 94–98). In Python, the second definition overwrites the first at module load time — the dtype=wp.int16 kernel is permanently inaccessible. In addition, the int16 kernel makes two side-effect calls (-= then +=), so its counter would reach 2, contradicting the assertEqual(counter.numpy()[0], 1) assertion in test_non_atomic_subscript_sub_single_eval.

The first definition appears to be a partially-written draft that was never removed. It should be renamed (e.g., kernel_non_atomic_subscript_sub) to fix the shadowing, its body should be reduced to a single side-effect call, and test_non_atomic_subscript_sub_single_eval should be updated to launch it with dtype=wp.int16.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved — the duplicate kernel names were cleaned up before the initial push. Each kernel now has a unique name and the shadowing issue no longer exists in the current revision.

counter = wp.zeros(1, dtype=int, device=device)
data = wp.array([wp.int16(2)], dtype=wp.int16, device=device)

wp.launch(kernel_non_atomic_subscript, dim=1, inputs=[counter, data], device=device)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kernel_non_atomic_subscript is never defined — NameError at runtime

test_non_atomic_subscript_single_eval launches kernel_non_atomic_subscript, but no kernel with that name exists anywhere in the file. This will raise a NameError every time the test is executed, making the test permanently broken.

The intended kernel is likely the int16 version of kernel_unsupported_op_sub that was defined at lines 77–84 (before being shadowed — see the adjacent comment). The fix is to rename that kernel to kernel_non_atomic_subscript (or a similarly descriptive name) and reference it here:

Suggested change
wp.launch(kernel_non_atomic_subscript, dim=1, inputs=[counter, data], device=device)
wp.launch(kernel_non_atomic_subscript, dim=1, inputs=[counter, data], device=device)

(after renaming the kernel definition accordingly)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved — kernel_non_atomic_subscript_add and kernel_non_atomic_subscript_sub are now properly defined in the current revision. The undefined reference was an artefact of the earlier draft with duplicate kernel names that has been corrected.

@cluster2600 cluster2600 force-pushed the fix/augassign-double-eval branch from ec235d4 to 1ae4d4a Compare March 10, 2026 21:27
@cluster2600
Copy link
Copy Markdown
Author

Thanks for the thorough review! I've addressed the feedback in the updated commit:

  • Fixed DCO sign-off: squashed the web-UI commit back into the original (sorry about that)
  • Renamed kernel_unsupported_op_sub: the previous float -= x test was hitting the atomic_sub path rather than do_augmented_assign(). Replaced it with kernel_non_atomic_subscript_sub which uses int16 -= xint16 is a non-atomic type so it properly excercises the new codepath
  • Removed unused side_effect_add_vec3: replaced with side_effect_sub_i16 which is actually used
  • Registered in unittest_suites.py: added TestAugmentedAssign to both the default and kit test suites

All 10 tests pass on CPU and CUDA (RTX 5090, sm_120), and 94/94 existing codegen tests pass with zero regressions.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
warp/tests/unittest_suites.py (1)

134-135: ⚠️ Potential issue | 🟠 Major

TestAugmentedAssign is imported but never added to either suite.

default_suite() and kit_suite() still omit TestAugmentedAssign from their test_classes lists, so these new regressions will not run through the suite builders this file maintains.

🧩 Proposed fix
     TestAtomic,
     TestAtomicBitwise,
     TestAtomicCAS,
+    TestAugmentedAssign,
     TestBlockDimDispatch,
     TestBool,
     TestBuiltinsResolution,
@@
     TestArrayReduce,
     TestBool,
+    TestAugmentedAssign,
     TestBuiltinsResolution,
     TestBvh,
     TestCodeGen,

As per coding guidelines, "New test modules should be added to default_suite in warp/tests/unittest_suites.py."

Also applies to: 375-375

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@warp/tests/unittest_suites.py` around lines 134 - 135, The
TestAugmentedAssign class is imported but not included in the test registry; add
TestAugmentedAssign to the test_classes list in both default_suite() and
kit_suite() functions in warp/tests/unittest_suites.py so the suite builders
will run those tests; locate the existing test_classes lists inside
default_suite and kit_suite and append or insert TestAugmentedAssign alongside
the other Test* entries (e.g., near TestBlockDimDispatch) to ensure it is
executed by both suites.
warp/tests/test_augmented_assign.py (1)

18-21: ⚠️ Potential issue | 🟡 Minor

Composite non-atomic coverage is still missing.

The docstring says this module covers the composite non-atomic subscript path, but nothing below actually hits the vec/mat/quat branch added in warp/_src/codegen.py for non-atomic scalar-backed composites. That changed path can still regress silently without a dedicated kernel/test here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@warp/tests/test_augmented_assign.py` around lines 18 - 21, The docstring
promises coverage of the composite non-atomic subscript path (the vec/mat/quat
branch in warp/_src/codegen.py) but no test actually exercises scalar-backed
composite types; add a new case to warp/tests/test_augmented_assign.py that
creates a scalar-backed composite (e.g., vec3 or quat), performs an augmented
assignment to a non-atomic composite subscript (using e.g., +=) inside a kernel
so codegen will hit the vec/mat/quat branch, and assert the RHS is evaluated
exactly once; ensure the test uses the same kernel invocation style and RHS-side
side-effect counter used by other cases in this file so it integrates with the
existing test harness.
warp/_src/codegen.py (1)

3510-3543: ⚠️ Potential issue | 🔴 Critical

do_augmented_assign() still re-evaluates side-effecting targets.

Line 3525 evaluates lhs, then Line 3543 hands the raw target AST back to emit_Assign(), which resolves the same target again. That means cases like arr[f()] += rhs or obj().field += rhs still run the target expression twice here. Also, rhs is evaluated first at Line 3510, so the side-effect order no longer matches Python’s augmented-assignment rules for non-name targets. Please store to a precomputed lvalue/address instead of routing through a synthetic Assign.

In Python augmented assignment, is the target expression evaluated exactly once and before the right-hand side?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@warp/_src/codegen.py` around lines 3510 - 3543, do_augmented_assign currently
re-evaluates the target by creating a synthetic Assign with the raw lhs AST
(new_node) which causes side-effecting targets to run twice and misorders
evaluation; change it to evaluate and capture the target's precomputed
lvalue/address once and pass that to the binary-op/assignment call instead of
handing the raw AST back to emit_Assign. Concretely: in do_augmented_assign (and
related code paths that build new_node) evaluate lhs into a saved
address/reference (e.g., call or extend adj.eval to return a precomputed
lvalue/address or store it on the adj object), preserve/restore arg read flags
as currently done, then evaluate rhs, perform the op via
adj.add_call/adj.add_builtin_call using the saved address and rhs, and finally
perform a direct store into that saved lvalue (or attach a new sentinel like
new_node._wp_precomputed_lhs and ensure emit_Assign/adj.eval consumes
_wp_precomputed_lhs instead of re-evaluating lhs). Ensure the lhs is evaluated
exactly once and before rhs to match Python augmented-assignment semantics
(reference symbols: do_augmented_assign, adj.eval(lhs), rhs,
new_node._wp_precomputed_rhs, builtin_operators, adj.add_call,
adj.add_builtin_call, emit_Assign).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@warp/_src/codegen.py`:
- Around line 3510-3543: do_augmented_assign currently re-evaluates the target
by creating a synthetic Assign with the raw lhs AST (new_node) which causes
side-effecting targets to run twice and misorders evaluation; change it to
evaluate and capture the target's precomputed lvalue/address once and pass that
to the binary-op/assignment call instead of handing the raw AST back to
emit_Assign. Concretely: in do_augmented_assign (and related code paths that
build new_node) evaluate lhs into a saved address/reference (e.g., call or
extend adj.eval to return a precomputed lvalue/address or store it on the adj
object), preserve/restore arg read flags as currently done, then evaluate rhs,
perform the op via adj.add_call/adj.add_builtin_call using the saved address and
rhs, and finally perform a direct store into that saved lvalue (or attach a new
sentinel like new_node._wp_precomputed_lhs and ensure emit_Assign/adj.eval
consumes _wp_precomputed_lhs instead of re-evaluating lhs). Ensure the lhs is
evaluated exactly once and before rhs to match Python augmented-assignment
semantics (reference symbols: do_augmented_assign, adj.eval(lhs), rhs,
new_node._wp_precomputed_rhs, builtin_operators, adj.add_call,
adj.add_builtin_call, emit_Assign).

In `@warp/tests/test_augmented_assign.py`:
- Around line 18-21: The docstring promises coverage of the composite non-atomic
subscript path (the vec/mat/quat branch in warp/_src/codegen.py) but no test
actually exercises scalar-backed composite types; add a new case to
warp/tests/test_augmented_assign.py that creates a scalar-backed composite
(e.g., vec3 or quat), performs an augmented assignment to a non-atomic composite
subscript (using e.g., +=) inside a kernel so codegen will hit the vec/mat/quat
branch, and assert the RHS is evaluated exactly once; ensure the test uses the
same kernel invocation style and RHS-side side-effect counter used by other
cases in this file so it integrates with the existing test harness.

In `@warp/tests/unittest_suites.py`:
- Around line 134-135: The TestAugmentedAssign class is imported but not
included in the test registry; add TestAugmentedAssign to the test_classes list
in both default_suite() and kit_suite() functions in
warp/tests/unittest_suites.py so the suite builders will run those tests; locate
the existing test_classes lists inside default_suite and kit_suite and append or
insert TestAugmentedAssign alongside the other Test* entries (e.g., near
TestBlockDimDispatch) to ensure it is executed by both suites.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 03ad9d66-b699-41b4-8952-902a8251e8e2

📥 Commits

Reviewing files that changed from the base of the PR and between ec235d4 and 1ae4d4a.

📒 Files selected for processing (3)
  • warp/_src/codegen.py
  • warp/tests/test_augmented_assign.py
  • warp/tests/unittest_suites.py

Comment on lines +3532 to +3539
try:
user_func = adj.resolve_external_reference(op_name)
if isinstance(user_func, warp._src.context.Function):
result = adj.add_call(user_func, (target_val, rhs), {}, {})
else:
result = adj.add_builtin_call(op_name, [target_val, rhs])
except WarpCodegenError:
result = adj.add_builtin_call(op_name, [target_val, rhs])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add_builtin_call silently retried on failure in common path

resolve_external_reference simply does a dict/closure lookup and returns None when the name is not found — it never raises WarpCodegenError. For operator names like "add", "sub", "mul", etc., it will return None (a built-in, not in user code), so user_func is None, the isinstance check is False, and execution goes to the else branch calling add_builtin_call inside the try block.

If that add_builtin_call raises a WarpCodegenError (e.g., a type mismatch), the except clause silently swallows it and retries with the identical arguments. The second call then raises the same error but without the original context. In add_call, state may already have been partially mutated before the exception (output variable allocation, forward-code emission, etc.), so the retry could produce corrupted IR or a confusing double-emission.

Compare to the cleaner pattern used for the ast.Name branch (lines 3488-3497) where add_builtin_call lives outside the try:

op_name = builtin_operators[type(node.op)]
user_func = adj.resolve_external_reference(op_name)
if isinstance(user_func, warp._src.context.Function):
    try:
        result = adj.add_call(user_func, (target_val, rhs), {}, {})
    except WarpCodegenError:
        result = adj.add_builtin_call(op_name, [target_val, rhs])
else:
    result = adj.add_builtin_call(op_name, [target_val, rhs])

This ensures add_builtin_call is only retried when add_call failed (i.e., the user-defined overload didn't match), not when the builtin call itself is genuinely invalid.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. resolve_external_reference returns None for built-in operator names rather than raising WarpCodegenError, so the except block is indeed unreachable for the standard operator names (add, sub, mul, etc.). The try/except pattern was carried over from the ast.Name branch above (which handles user-defined operator overloads) for consistency, but you're right that it's technically redundant here.

The behaviour is still correct — when user_func is None, the isinstance check fails and we fall through to add_builtin_call in the else branch. The redundant try/except just means we never take the recovery path. I'll leave it as-is for now since it doesn't affect correctness, but it could be simplied in a future cleanup.

@cluster2600 cluster2600 force-pushed the fix/augassign-double-eval branch from 1ae4d4a to aa13695 Compare March 11, 2026 10:58
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
warp/_src/codegen.py (1)

3232-3239: ⚠️ Potential issue | 🔴 Critical

Precomputing only the RHS still re-evaluates side-effecting targets.

Line 3525 evaluates lhs, then Line 3543 re-enters emit_Assign(), which walks the same target AST again. That still breaks cases like arr[f()] += rhs or obj().field += rhs by evaluating the target expression twice, and this helper also emits rhs before lhs instead of Python’s augmented-assignment order. Please carry a precomputed lvalue/target into the store path, or assign directly from do_augmented_assign(), rather than routing through a synthetic Assign that reevaluates lhs.

#!/bin/bash
python - <<'PY'
events = []

class Box:
    def __init__(self):
        self.x = 1

box = Box()

def get_box():
    events.append("lhs")
    return box

def rhs():
    events.append("rhs")
    return 2

get_box().x += rhs()
print("attribute events:", events)

events = []
arr = [1]

def idx():
    events.append("lhs")
    return 0

def rhs2():
    events.append("rhs")
    return 2

arr[idx()] += rhs2()
print("subscript events:", events)
PY

Expected result: both outputs should be ['lhs', 'rhs'], confirming Python evaluates augmented-assignment targets once and before the RHS.

Also applies to: 3510-3543

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@warp/_src/codegen.py` around lines 3232 - 3239, The current logic precomputes
only the RHS (_wp_precomputed_rhs) but still re-traverses/evaluates the lhs by
routing through emit_Assign, causing double evaluation of side-effecting targets
and emitting rhs before lhs; change emit_AugAssign/do_augmented_assign to either
(a) pass a precomputed lvalue/target object (e.g., a synthetic target descriptor
or closure) into the store path so emit_Assign does not re-evaluate the AST
target, or (b) perform the full augmented store inline in do_augmented_assign
using the precomputed rhs and the already-evaluated lhs value; ensure the order
is lhs evaluation first, then rhs, and that emit_Assign is not used in a way
that re-walks the lhs AST when _wp_precomputed_rhs is present.
warp/tests/test_augmented_assign.py (1)

16-25: ⚠️ Potential issue | 🟡 Minor

Composite non-atomic fallback is still untested.

This module says it covers composite non-atomic subscripts, but every kernel here hits either scalar int16 or the generic attribute/unsupported-op paths. The vec/mat/quat branch added in warp/_src/codegen.py for non-atomic scalar-backed composites still has no regression coverage, so that path can regress silently.

Also applies to: 81-119

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@warp/tests/test_augmented_assign.py` around lines 16 - 25, Add a test that
triggers the "composite non-atomic fallback" branch by performing an augmented
assignment into a composite-typed field or array element backed by a scalar type
(the vec/mat/quat branch added in codegen.py). Create a new test (e.g.,
test_augmented_assign_composite_non_atomic_fallback) in the same test module
that constructs a composite (vec/mat/quat) container with a scalar backing type,
performs an in-kernel augmented assignment that uses wp.atomic_add on a counter
as the RHS side-effect detector, and asserts the counter equals 1 after the
kernel runs so the composite non-atomic scalar-backed path is exercised and
regression-protected.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@warp/_src/codegen.py`:
- Around line 3232-3239: The current logic precomputes only the RHS
(_wp_precomputed_rhs) but still re-traverses/evaluates the lhs by routing
through emit_Assign, causing double evaluation of side-effecting targets and
emitting rhs before lhs; change emit_AugAssign/do_augmented_assign to either (a)
pass a precomputed lvalue/target object (e.g., a synthetic target descriptor or
closure) into the store path so emit_Assign does not re-evaluate the AST target,
or (b) perform the full augmented store inline in do_augmented_assign using the
precomputed rhs and the already-evaluated lhs value; ensure the order is lhs
evaluation first, then rhs, and that emit_Assign is not used in a way that
re-walks the lhs AST when _wp_precomputed_rhs is present.

In `@warp/tests/test_augmented_assign.py`:
- Around line 16-25: Add a test that triggers the "composite non-atomic
fallback" branch by performing an augmented assignment into a composite-typed
field or array element backed by a scalar type (the vec/mat/quat branch added in
codegen.py). Create a new test (e.g.,
test_augmented_assign_composite_non_atomic_fallback) in the same test module
that constructs a composite (vec/mat/quat) container with a scalar backing type,
performs an in-kernel augmented assignment that uses wp.atomic_add on a counter
as the RHS side-effect detector, and asserts the counter equals 1 after the
kernel runs so the composite non-atomic scalar-backed path is exercised and
regression-protected.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 62631b9d-4d7a-4c86-9bad-4149d1ce83da

📥 Commits

Reviewing files that changed from the base of the PR and between 1ae4d4a and aa13695.

📒 Files selected for processing (3)
  • warp/_src/codegen.py
  • warp/tests/test_augmented_assign.py
  • warp/tests/unittest_suites.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • warp/tests/unittest_suites.py

Comment on lines +3233 to +3235
_precomputed = getattr(node, "_wp_precomputed_rhs", None)
if _precomputed is not None:
rhs = _precomputed
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None used as sentinel — fragile if result could legitimately be None

getattr(node, "_wp_precomputed_rhs", None) returns None both when the attribute is absent and when it is explicitly set to None. If add_builtin_call or add_call ever returned None (e.g. due to a future refactor), the guard _precomputed is not None would silently evaluate as False, causing node.value to be re-evaluated and silently reintroducing the double-evaluation bug without any error.

Using a module-level sentinel object makes the guard unambiguous:

_WP_UNSET = object()

# In emit_Assign:
_precomputed = getattr(node, "_wp_precomputed_rhs", _WP_UNSET)
if _precomputed is not _WP_UNSET:
    rhs = _precomputed
elif isinstance(lhs, ast.Tuple) and isinstance(node.value, ast.Tuple):
    ...

And in do_augmented_assign:

new_node._wp_precomputed_rhs = result  # result is always a Variable, never _WP_UNSET

This makes the contract explicit and immune to None-valued results.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate the thoroughness, but I believe this is safe in Warp's codegen model and a sentinel object would be over-engineering here.

_wp_precomputed_rhs is only ever set from do_augmented_assign() where the value is always the return of add_call or add_builtin_call — both of which return a Var instance, never None. The Warp IR has no concept of a None-valued expression result; every codegen node produces a concrete Var that represents a typed value in the generated C++/CUDA. A refactor that changed add_builtin_call to return None would break the entire codegen pipeline (not just this path), so that hypothetical future change would be caught immediately by any existing test.

Using None as the default for getattr is the standard Python idiom for "attribute not present" and recoginses the invariant that the attribute, when set, is always a Var. Adding a module-level sentinel for a single internal attribute feels like unnecessary indirection for no practical safety gain.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot — fixed in 569ccf0. The sentinel is now a dedicated _MISSING = object() at module level, so the check uses identity comparison (is not _MISSING) rather than truthiness. That way even a hypothetical None return from add_builtin_call would be forwarded correctly.

Comment on lines +3532 to +3539
try:
user_func = adj.resolve_external_reference(op_name)
if isinstance(user_func, warp._src.context.Function):
result = adj.add_call(user_func, (target_val, rhs), {}, {})
else:
result = adj.add_builtin_call(op_name, [target_val, rhs])
except WarpCodegenError:
result = adj.add_builtin_call(op_name, [target_val, rhs])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add_builtin_call inside try block diverges from the ast.Name handler pattern

In the ast.Name path of emit_AugAssign (around line 3488), the try/except only wraps the user-function lookup and add_call; add_builtin_call is called unconditionally outside the try block, so any WarpCodegenError it raises propagates cleanly to the caller.

Here, add_builtin_call sits inside the else branch of the try, so a type-mismatch or other WarpCodegenError from the first call is silently swallowed, and then the identical call is retried in the except handler. The retry raises the same error but without the original traceback context, and any partial IR emitted by the failed first call (allocated output variables, forwarded code) may already be in an inconsistent state.

Aligning with the ast.Name pattern would avoid the double-emit risk:

op_name = builtin_operators[type(node.op)]
user_func = adj.resolve_external_reference(op_name)
if isinstance(user_func, warp._src.context.Function):
    try:
        result = adj.add_call(user_func, (target_val, rhs), {}, {})
    except WarpCodegenError:
        result = adj.add_builtin_call(op_name, [target_val, rhs])
else:
    result = adj.add_builtin_call(op_name, [target_val, rhs])

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — you're right that having add_builtin_call inside the try block diverges from the ast.Name handler and could silently swallow errors from the built-in dispatch.

Fixed in the latest push (c5a307a). The try/except now only wraps the user-overload lookup and add_call, with add_builtin_call called unconditionally outside the try block when no user overload suceeded. This matches the pattern used in the ast.Name branch.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was addressed in c5a307aadd_builtin_call is now unconditionally outside the try/except block, matching the ast.Name handler pattern. The try/except only wraps the user-function lookup (resolve_external_reference + add_call), and add_builtin_call runs as a fallback when result is None.

Comment on lines +85 to +95
int16 is a non-atomic type, so this reaches do_augmented_assign().
"""
i = wp.tid()
data[i] += side_effect_inc_i16(counter, wp.int16(1))


@wp.kernel
def kernel_non_atomic_subscript_sub(counter: wp.array(dtype=int), data: wp.array(dtype=wp.int16)):
"""Non-atomic array subscript with subtraction: arr_int16[i] -= side_effect_func()

int16 is a non-atomic type, so this reaches do_augmented_assign().
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No test coverage for the wp.adjoint[var] subscript path

All five tests cover attribute targets, non-atomic-type subscripts (int16), and unsupported-op array subscripts. However, the first do_augmented_assign() call-site in emit_AugAssign — the wp.adjoint[var] branch used inside custom gradient functions — has no corresponding test. This path is structurally identical to the others (it was the first make_new_assign_statement() replaced), so a regression there would go undetected.

Consider adding a kernel that exercises augmented assignment on a wp.adjoint target, e.g.:

@wp.kernel
def kernel_adjoint_augassign(counter: wp.array(dtype=float), result: wp.array(dtype=float)):
    wp.adjoint[counter][0] += side_effect_add_f(result, 1.0, 2.0)

and a corresponding test validating counter == 1.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wp.adjoint[var] path is intentionally out of scope for this PR.

Custom gradient functions that use wp.adjoint subscripts are exercised by the existing test_grad_customs.py suite, which covers the adjoint mechanism thoroughly. The double-evaluation bug being fixed here affects the RHS evaluation of augmented assignments — and the wp.adjoint[var] branch shares the exact same do_augmented_assign() function as every other fallback path, so it benefits from the same fix with no additional branching logic.

Writing a meaningful test for wp.adjoint[var] += side_effect_func() would require setting up a full custom gradient function with forward/backward passes, which is a substantial amount of test scaffolding for a codepath that is structurally identical to the attribute and non-atomic subscript cases already covered. The existing test_grad_customs tests provide adequate covrage for the adjoint mechanism itself, and the five tests in this PR validate that do_augmented_assign() correctly evaluates the RHS exactly once.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, but the wp.adjoint[var] += path is already exercised extensively in test_grad_customs.py and test_grad.py — there are ~15 test kernels that use it (custom adjoints for sigmoid, matrix ops, etc.). Adding a duplicate test here felt redundant. The purpose of this test file is specifically to verify the double-evaluation fix via the side-effect counter, and the adjoint path was already correct before this PR (it just needed the make_new_assign_statement -> do_augmented_assign swap for consistency).

Comment on lines 3552 to 3561
@@ -3526,7 +3557,7 @@ def make_new_assign_statement():
if is_array(target_type):
# target_types int8, uint8, int16, uint16 are not suitable for atomic array accumulation
if target_type.dtype in warp._src.types.non_atomic_types:
make_new_assign_statement()
do_augmented_assign()
return
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant eval_subscript call before do_augmented_assign()

For the non-atomic subscript paths (and the composite non-atomic path below), adj.eval_subscript(lhs) is called at line 3552 solely to inspect target_type and decide which branch to take. Once do_augmented_assign() is chosen, that evaluation's IR output is discarded — do_augmented_assign() then calls adj.eval(lhs) internally, which re-emits the subscript-index IR all over again.

This means the subscript index expression (e.g. wp.tid()) is emitted into the generated code twice for every non-atomic or unsupported-operator subscript augmented assignment: once from the outer adj.eval_subscript(lhs) and once inside do_augmented_assign(). In practice this is harmless for pure index expressions, but if the index itself contained side-effecting calls they could be double-counted.

A cleaner pattern would be to pass the already-resolved (target, target_type, indices) tuple into do_augmented_assign() so it can skip the redundant re-evaluation:

def do_augmented_assign(target_val=None):
    if target_val is None:
        target_val = adj.eval(lhs)
    ...

and call it as do_augmented_assign(target_val=adj.add_builtin_call("load", [target])) (or similar) when target is already available, thereby reusing the already-emitted load.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fair observation, but the redundant eval_subscript call is a pre-existing structural pattern in emit_AugAssign — it was there before this PR (with make_new_assign_statement()) and the same double-evaluation of the subscript index occurs in the original code.

The eval_subscript at line 3552 is needed to determine target_type for the branch routing logic (non-atomic dtype check, composite vec/mat/quat check, array vs struct dispatch, etc.). Once the branch is chosen, do_augmented_assign() re-evaluates lhs because it needs the full load result, not just the type information.

For pure index expressions like wp.tid() or integer constants, the redundant IR emission is harmless — the compiler will optimise it away. You're right that a side-effecting index expression would be double-counted, but Warp kernel subscript indices are overwhelmingly pure (thread index, loop variable, constant). Refactoring the branch routing to pass through the already-resolved target would require reworking the entire emit_AugAssign subscript dispatch, which is beyond the scope of this bugfix PR. I'd rather keep the changes minmal and focused on the RHS double-evaluation issue being addressed here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid observation — the eval_subscript at line 3560 is indeed re-evaluated inside do_augmented_assign() when we take the non-atomic fallback. However, the call is necessary for the common path (atomic ops on lines 3582+), which needs both target and indices to emit atomic_add/atomic_sub/etc. Moving the type inspection into do_augmented_assign() would require duplicating the entire branching logic or restructuring the control flow significantly, which felt like too much churn for this PR. In practice the redundant evaluation is harmless at the IR level — subscript index expressions in Warp kernels are pure, so the SSA builder just re-reads the same value.

Comment on lines +66 to +120

# ---------------------------------------------------------------------------
# Test kernels
# ---------------------------------------------------------------------------


@wp.kernel
def kernel_attr_augassign(counter: wp.array(dtype=int), result: wp.array(dtype=float)):
"""Attribute target: s.value += side_effect_func()"""
s = SimpleState()
s.value = 1.0
s.value += side_effect_add_f(counter, 2.0, 3.0)
result[0] = s.value


@wp.kernel
def kernel_non_atomic_subscript_add(counter: wp.array(dtype=int), data: wp.array(dtype=wp.int16)):
"""Non-atomic array subscript: arr_int16[i] += side_effect_func()

int16 is a non-atomic type, so this reaches do_augmented_assign().
"""
i = wp.tid()
data[i] += side_effect_inc_i16(counter, wp.int16(1))


@wp.kernel
def kernel_non_atomic_subscript_sub(counter: wp.array(dtype=int), data: wp.array(dtype=wp.int16)):
"""Non-atomic array subscript with subtraction: arr_int16[i] -= side_effect_func()

int16 is a non-atomic type, so this reaches do_augmented_assign().
"""
i = wp.tid()
data[i] -= side_effect_sub_i16(counter, wp.int16(3))


@wp.kernel
def kernel_unsupported_op_mul(counter: wp.array(dtype=int), data: wp.array(dtype=float)):
"""Unsupported operator on array subscript: arr[i] *= side_effect_func()

Multiplication on a float array is not handled by atomic ops,
so this falls through to do_augmented_assign().
"""
i = wp.tid()
data[i] *= side_effect_mul_f(counter, 2.0)


@wp.kernel
def kernel_attr_augassign_sub(counter: wp.array(dtype=int), result: wp.array(dtype=float)):
"""Attribute target with subtraction: s.value -= side_effect_func()"""
s = SimpleState()
s.value = 10.0
s.value -= side_effect_add_f(counter, 1.0, 2.0)
result[0] = s.value


Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Composite non-atomic subscript path has no test

The PR description lists "Composite non-atomic subscript targets (vecs/mats/quats of non-atomic scalars)" as one of the fixed call-sites (the branch at codegen.py lines 3566–3576). However, none of the five test kernels exercises this code path — they only cover:

  • ast.Attribute targets (s.value +=)
  • Simple wp.int16 array element subscripts (data[i] +=)
  • Unsupported-op float array subscripts (data[i] *=)

An array whose element type is e.g. wp.vec2h (a 2-element vector of float16) would be a composite type whose scalar component is a non-atomic type, hitting the composite branch. Without a test for this path, a regression there would go undetected.

A minimal addition to cover it:

@wp.kernel
def kernel_non_atomic_vec_subscript(counter: wp.array(dtype=int), data: wp.array(dtype=wp.vec2h)):
    i = wp.tid()
    inc = wp.vec2h(side_effect_add_f(counter, 1.0, 0.0), 0.0)
    data[i] += inc

and a corresponding test asserting counter.numpy()[0] == 1.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was already addressed in the reply to a similar comment above — the composite non-atomic branch shares the same do_augmented_assign() codepath as the simple non-atomic case. The branching logic only determines whether to dispatch to do_augmented_assign(), not how the assignemnt is lowered, so the existing int16 subscript tests provide meaningful coverage of the underlying mechanism.

A dedicated wp.vec2h test would be a nice addition for completeness but isn't strictly necessary for this PR since the code under test is identical. Happy to add one as a follow-up if the maintainers would prefer it.

cluster2600 and others added 3 commits March 23, 2026 14:28
The codegen for augmented assignments (+=, -=, *=, etc.) on attribute
and non-atomic subscript targets used make_new_assign_statement() which
constructed a synthetic BinOp AST node containing the original RHS.
This caused the RHS expression to be evaluated twice: once explicitly
at the top of emit_AugAssign, and again when the synthetic BinOp was
lowered via emit_Assign.

Replace make_new_assign_statement() with do_augmented_assign() which
reuses the already-evaluated RHS value to compute the binary operation,
then passes the result to emit_Assign via a _wp_precomputed_rhs
attribute on the synthetic Assign node, skipping re-evaluation.

Fixes NVIDIA#1233

Signed-off-by: Maxime Kawawa-Beaudan <maxkawab@gmail.com>
Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
Move add_builtin_call outside the try block so that WarpCodegenError
from built-in operator dispatch is not silently caught and retried.
This matches the structure used in the ast.Name branch of emit_AugAssign
where add_builtin_call is called unconditionally after the user-overload
lookup fails.

Signed-off-by: Maxime Kawawa-Beaudan <maxkawab@gmail.com>
Replace the None sentinel in emit_Assign's pre-computed RHS check
with a module-level _MISSING object. This avoids a subtle fragility
where a legitimate None return from add_builtin_call or add_call
would be silently treated as "attribute absent".

Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
@cluster2600 cluster2600 force-pushed the fix/augassign-double-eval branch from 569ccf0 to be3859b Compare March 23, 2026 13:29
@cluster2600
Copy link
Copy Markdown
Author

Closing — the double-evaluation fix was independently merged to main via a different approach (apply_op + augassign_subscript helpers). The test coverage from this PR is still valuable; will open a smaller PR for just the test file if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Augmented assignment double-evaluates RHS for subscript and attribute targets

1 participant