Optimize the performance of circular buffer #4275

iansseijelly · 2025-12-23T23:53:26Z

Description

This PR addresses issue #4274.

Type of change

Bug fix (non-breaking change which fixes an issue)

Screenshots

Training throughput before and after the patch running training on task [Isaac-Velocity-Flat-Spot-v0].

## Checklist

I have read and understood the contribution guidelines
I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have updated the changelog and the corresponding version in the extension's config/extension.toml file
I have added my name to the CONTRIBUTORS.md or my name already exists there

…n the hot path, avoiding unnecessary kernel and synchronization

greptile-apps · 2025-12-24T00:00:47Z

Greptile Summary

Optimized circular buffer performance by eliminating GPU-CPU synchronization in the hot path, addressing the performance issue identified in #4274.

Key optimizations:

Cached max_length as an integer to avoid repeated .item() calls that trigger GPU sync
Added _all_initialized flag to skip first-push checks after warmup, avoiding torch.any().item() calls
Removed unnecessary .clone() in DelayBuffer.compute() return path
Changed to in-place assignment ([:]) in DelayedPDActuator to avoid tensor reallocation

Performance impact:
According to the PR description, profiling showed excessive time spent in aten::item (89% CPU time) and torch.any checks. These optimizations eliminate both bottlenecks in steady-state operation after warmup.

Code correctness:

The in-place assignment pattern is safe because CircularBuffer.__getitem__ returns indexed views
The _all_initialized flag is correctly reset on buffer reset
Initialization logic still handles first-push correctly before setting the flag

Confidence Score: 4/5

This PR is safe to merge with minimal risk - well-targeted performance optimizations
The changes are focused performance optimizations that eliminate GPU-CPU synchronization overhead without altering logic. The coordinated changes across three files (caching max_length, skipping checks after warmup, removing clone, using in-place assignment) work together correctly. Minor deduction because there are no new tests to verify the optimizations don't break edge cases, though the logic appears sound.
No files require special attention

Important Files Changed

Filename	Overview
source/isaaclab/isaaclab/utils/buffers/circular_buffer.py	Optimized to avoid GPU-CPU synchronization in hot path by caching max_length as integer and tracking initialization state
source/isaaclab/isaaclab/utils/buffers/delay_buffer.py	Removed unnecessary `.clone()` call since consumer now uses in-place assignment
source/isaaclab/isaaclab/actuators/actuator_pd.py	Changed to in-place assignment to avoid unnecessary tensor allocation and copying

Sequence Diagram

sequenceDiagram
    participant Actuator as DelayedPDActuator
    participant DelayBuf as DelayBuffer
    participant CircBuf as CircularBuffer
    
    Note over Actuator,CircBuf: Hot Path (called every physics step)
    
    Actuator->>DelayBuf: compute(control_action.joint_positions)
    DelayBuf->>CircBuf: append(data)
    
    alt First time after reset
        CircBuf->>CircBuf: Check _all_initialized flag (false)
        CircBuf->>CircBuf: Check is_first_push = (_num_pushes == 0)
        CircBuf->>CircBuf: Call .any().item() (GPU sync)
        CircBuf->>CircBuf: Initialize buffer if needed
        CircBuf->>CircBuf: Set _all_initialized = true
    else All batches initialized (optimized path)
        CircBuf->>CircBuf: Skip initialization check
        Note over CircBuf: No GPU-CPU sync needed!
    end
    
    CircBuf->>CircBuf: Increment _num_pushes
    DelayBuf->>CircBuf: __getitem__(time_lags)
    CircBuf-->>DelayBuf: Return delayed data (view)
    DelayBuf-->>Actuator: Return delayed data (no clone)
    Actuator->>Actuator: In-place assign with [:]
    
    Note over Actuator,CircBuf: Optimizations Applied:<br/>1. Cached max_length as int (avoid .item())<br/>2. Skip initialization check after warmup<br/>3. Removed unnecessary .clone()<br/>4. In-place assignment in actuator

greptile-apps · 2025-12-24T00:00:48Z

Greptile found no issues!

From now on, if a review finishes and we haven't found any issues, we will not post anything, but you can confirm that we reviewed your changes in the status check section.

_{This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".}

garylvov · 2025-12-27T07:00:46Z

Hi, this is a cool optimization, I'm excited to try it, thank you!

Two quick qs:

Does this only affect configurations that use the delayed PD actuator, or does it apply to all actuator configurations?
I see in your charts that the FPS is improved. How much of a difference does this make in cumulative training time? Like, does it go from 2 hours to train to 1.5 hours to train? I'd love to see the reward with and without the optimization over time

Mayankm96 · 2025-12-27T17:33:29Z

source/isaaclab/isaaclab/actuators/actuator_pd.py

-        control_action.joint_positions = self.positions_delay_buffer.compute(control_action.joint_positions)
-        control_action.joint_velocities = self.velocities_delay_buffer.compute(control_action.joint_velocities)
-        control_action.joint_efforts = self.efforts_delay_buffer.compute(control_action.joint_efforts)
+        control_action.joint_positions[:] = self.positions_delay_buffer.compute(control_action.joint_positions)


Is the assignment operation here needed? 🤔

This uses slice assignment to keep the original tensor storage (control_action.joint_positions etc.) and overwrite its contents in place. Without this, compute() may return a new tensor, causing buffer replacement and additional allocation or copy overhead.

The delay buffer anyway returns a copied tensor since the time-lags indexed the torch tensor internally. The operation here will do another copy of that tensor which I don't think is needed. Also it will override the initial command set into the environment (after action processing). This may affect the next decimation consequently.

@iansseijelly and @T-K-233 : Let me know your thoughts on the above. Thanks again!

source/isaaclab/isaaclab/utils/buffers/circular_buffer.py

T-K-233 · 2025-12-27T22:11:03Z

Hi, this is a cool optimization, I'm excited to try it, thank you!

Two quick qs:

Does this only affect configurations that use the delayed PD actuator, or does it apply to all actuator configurations?

It only affect the delayed PD actuator. Only the DelayedPDActuator utilizes the DelayBuffer object.

I see in your charts that the FPS is improved. How much of a difference does this make in cumulative training time? Like, does it go from 2 hours to train to 1.5 hours to train? I'd love to see the reward with and without the optimization over time

For the Isaac-Velocity-Flat-Spot-v0 task, the cumulative training time reduces from 45.76 minutes to 42.35 minutes. This is with 2000 iters, and the difference will become bigger for longer training runs.

The reward and metric of this task with and without this change is identical.

The full tensorboard log is also attached for reference:

spot_training_logs.zip

garylvov · 2025-12-27T22:23:23Z

Awesome, thank you for answering my questions, I really appreciate it!

…verting all boolean conditions

source/isaaclab/isaaclab/utils/buffers/circular_buffer.py

perf: optimize the performance of circular buffer to cut its impact i…

6b5c172

…n the hot path, avoiding unnecessary kernel and synchronization

iansseijelly requested review from Mayankm96, jtigue-bdai and ooctipus as code owners December 23, 2025 23:53

github-actions bot added bug Something isn't working isaac-lab Related to Isaac Lab team labels Dec 23, 2025

fix: run pre commit and change contributors

43cfa4c

Mayankm96 reviewed Dec 27, 2025

View reviewed changes

source/isaaclab/isaaclab/utils/buffers/circular_buffer.py Outdated Show resolved Hide resolved

Mayankm96 changed the title ~~perf: optimize the performance of circular buffer~~ Optimize the performance of circular buffer Dec 27, 2025

garylvov approved these changes Dec 27, 2025

View reviewed changes

fix: change the naming from _all_initialized to _need_reset, hence in…

d860772

…verting all boolean conditions

Mayankm96 reviewed Dec 28, 2025

View reviewed changes

source/isaaclab/isaaclab/utils/buffers/circular_buffer.py Outdated Show resolved Hide resolved

Mayankm96 reviewed Dec 28, 2025

View reviewed changes

source/isaaclab/isaaclab/utils/buffers/circular_buffer.py Outdated Show resolved Hide resolved

Mayankm96 reviewed Dec 28, 2025

View reviewed changes

source/isaaclab/isaaclab/utils/buffers/circular_buffer.py Outdated Show resolved Hide resolved

fix: changing comments from init to reset vacb

a0c440d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize the performance of circular buffer #4275

Optimize the performance of circular buffer #4275

iansseijelly commented Dec 23, 2025 •

edited

Loading

Uh oh!

greptile-apps bot commented Dec 24, 2025

Uh oh!

greptile-apps bot commented Dec 24, 2025

Uh oh!

garylvov commented Dec 27, 2025

Uh oh!

Mayankm96 Dec 27, 2025

Uh oh!

T-K-233 Dec 27, 2025

Uh oh!

Mayankm96 Dec 28, 2025

Uh oh!

Mayankm96 Dec 30, 2025

Uh oh!

Uh oh!

T-K-233 commented Dec 27, 2025

Uh oh!

garylvov commented Dec 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Optimize the performance of circular buffer #4275

Are you sure you want to change the base?

Optimize the performance of circular buffer #4275

Conversation

iansseijelly commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Screenshots

Uh oh!

greptile-apps bot commented Dec 24, 2025

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot commented Dec 24, 2025

Greptile found no issues!

Uh oh!

garylvov commented Dec 27, 2025

Uh oh!

Mayankm96 Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

T-K-233 Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

Mayankm96 Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Mayankm96 Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

T-K-233 commented Dec 27, 2025

Uh oh!

garylvov commented Dec 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

iansseijelly commented Dec 23, 2025 •

edited

Loading

garylvov commented Dec 27, 2025 •

edited

Loading