-
Notifications
You must be signed in to change notification settings - Fork 294
Extract environment boilerplate code from within the device interfaces to a separate header #6622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
||
| // Initial value | ||
| OutputExtremumT initial_value{::cuda::std::numeric_limits<InputValueT>::max()}; | ||
| OutputExtremumT initial_value{::cuda::std::numeric_limits<InputValueT>::lowest()}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to figure out why this bug was not caught from the tests
This comment has been minimized.
This comment has been minimized.
c72dd09 to
03c1ee9
Compare
😬 CI Workflow Results🟥 Finished in 3h 00m: Pass: 28%/81 | Total: 2d 04h | Max: 2h 59m | Hits: 81%/22346See results here. |
|
|
||
| return deallocate_error; | ||
| // Lambda that calls reduce_impl with the right overload based on determinism | ||
| auto reduce_callable = [&](auto tuning, void* storage, size_t& bytes, auto... args) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: it seems that we are not using anything from the outside scope, so we don't need to capture by reference [&]. Comment applies throughout PR
|
|
||
| return deallocate_error; | ||
| // Lambda that calls reduce_impl with the right overload based on determinism | ||
| auto reduce_callable = [&](auto tuning, void* storage, size_t& bytes, auto... args) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: do we need a lambda to make this work? Can we just pass reduce_impl<tuning_t> directly?
fixes #5606
Boilerplate code for extracting types information (
stream,mr,tuning_tetc.) is too big and repetitive across the new device environment based interfaces we introduced. This PR extracts the code into a separate function and re-uses it in the existing environment based device APIs that we have (DeviceScanandDeviceReduce).Some consideration about the design for the reviewers:
deterministm_tis supported. For exampleDeviceReduce::Reducecan support bothgpu_to_gpuandrun_to_rundeterminism, whileDeviceReduce::ArgMax/MinorDeviceScanonly supportrun_to_runat the moment. That means the determinism heuristics cannot be incorporated into the boilerplate code. Future environment-based APIs must individually evaluate each algorithm to determine and support the appropriate deterministic types.I need some feedback on whether this interface on the
dispatch_with_env()looks sane.