-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Optimize the performance of circular buffer #4275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…n the hot path, avoiding unnecessary kernel and synchronization
Greptile SummaryOptimized circular buffer performance by eliminating GPU-CPU synchronization in the hot path, addressing the performance issue identified in #4274. Key optimizations:
Performance impact: Code correctness:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Actuator as DelayedPDActuator
participant DelayBuf as DelayBuffer
participant CircBuf as CircularBuffer
Note over Actuator,CircBuf: Hot Path (called every physics step)
Actuator->>DelayBuf: compute(control_action.joint_positions)
DelayBuf->>CircBuf: append(data)
alt First time after reset
CircBuf->>CircBuf: Check _all_initialized flag (false)
CircBuf->>CircBuf: Check is_first_push = (_num_pushes == 0)
CircBuf->>CircBuf: Call .any().item() (GPU sync)
CircBuf->>CircBuf: Initialize buffer if needed
CircBuf->>CircBuf: Set _all_initialized = true
else All batches initialized (optimized path)
CircBuf->>CircBuf: Skip initialization check
Note over CircBuf: No GPU-CPU sync needed!
end
CircBuf->>CircBuf: Increment _num_pushes
DelayBuf->>CircBuf: __getitem__(time_lags)
CircBuf-->>DelayBuf: Return delayed data (view)
DelayBuf-->>Actuator: Return delayed data (no clone)
Actuator->>Actuator: In-place assign with [:]
Note over Actuator,CircBuf: Optimizations Applied:<br/>1. Cached max_length as int (avoid .item())<br/>2. Skip initialization check after warmup<br/>3. Removed unnecessary .clone()<br/>4. In-place assignment in actuator
|
Greptile found no issues!From now on, if a review finishes and we haven't found any issues, we will not post anything, but you can confirm that we reviewed your changes in the status check section. This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR". |
|
Hi, this is a cool optimization, I'm excited to try it, thank you! Two quick qs:
|
| control_action.joint_positions = self.positions_delay_buffer.compute(control_action.joint_positions) | ||
| control_action.joint_velocities = self.velocities_delay_buffer.compute(control_action.joint_velocities) | ||
| control_action.joint_efforts = self.efforts_delay_buffer.compute(control_action.joint_efforts) | ||
| control_action.joint_positions[:] = self.positions_delay_buffer.compute(control_action.joint_positions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the assignment operation here needed? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This uses slice assignment to keep the original tensor storage (control_action.joint_positions etc.) and overwrite its contents in place. Without this, compute() may return a new tensor, causing buffer replacement and additional allocation or copy overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The delay buffer anyway returns a copied tensor since the time-lags indexed the torch tensor internally. The operation here will do another copy of that tensor which I don't think is needed. Also it will override the initial command set into the environment (after action processing). This may affect the next decimation consequently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iansseijelly and @T-K-233 : Let me know your thoughts on the above. Thanks again!
|
Awesome, thank you for answering my questions, I really appreciate it! |
…verting all boolean conditions

Description
This PR addresses issue #4274.
Type of change
Screenshots
Training throughput before and after the patch running training on task
[Isaac-Velocity-Flat-Spot-v0].pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there