-
Notifications
You must be signed in to change notification settings - Fork 798
[SYCL] Handler-less kernel submit path (range based) #20741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sycl
Are you sure you want to change the base?
Conversation
This can/should go to a separate NFC PR. |
|
|
||
| device get_device() const; | ||
|
|
||
| #ifndef __INTEL_PREVIEW_BREAKING_CHANGES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're in ABI breaking window, just drop it.
|
|
||
| bool eventNeeded() const; | ||
|
|
||
| device get_device() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an extra std::shared_ptr copy. https://github.com/intel/llvm/pull/20698/files#r2561353949 is related. Can you collaborate with @lbushi25 for a better fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. I am planning to create a separate PR with a refactor to avoid this.
| template <int Dims, typename LambdaArgType> struct TransformUserItemType { | ||
| using type = std::conditional_t< | ||
| std::is_convertible_v<nd_item<Dims>, LambdaArgType>, nd_item<Dims>, | ||
| std::conditional_t<std::is_convertible_v<item<Dims>, LambdaArgType>, | ||
| item<Dims>, LambdaArgType>>; | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
llvm/sycl/include/sycl/detail/type_traits.hpp
Lines 385 to 397 in 259c433
| // Example usage: | |
| // using mapped = map_type<type_to_map, from0, /*->*/ to0, | |
| // from1, /*->*/ to1, | |
| // ...> | |
| template <typename...> struct map_type { | |
| using type = void; | |
| }; | |
| template <typename T, typename From, typename To, typename... Rest> | |
| struct map_type<T, From, To, Rest...> { | |
| using type = std::conditional_t<std::is_same_v<From, T>, To, | |
| typename map_type<T, Rest...>::type>; | |
| }; |
| friend class ext::oneapi::experimental::detail::dynamic_parameter_impl; | ||
| friend class ext::oneapi::experimental::detail::dynamic_command_group_impl; | ||
|
|
||
| #ifndef __INTEL_PREVIEW_BREAKING_CHANGES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just drop it.
| using KernelType = std::remove_const_t< | ||
| std::remove_reference_t<std::tuple_element_t<0, std::tuple<RestT...>>>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as in #20756 (review) - use detail::nth_type_t, it's much better for compile time.
But also, this doesn't make sense because of line 3755 (AreAllButLastReductions check). Did you mean to access sizeof...(RestT) - 1's element?
| detail::tls_code_loc_t TlsCodeLocCapture(CodeLoc); | ||
|
|
||
| using KernelType = | ||
| std::remove_const_t<std::remove_reference_t<KernelTypeUniversalRef>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would std::decay_t suffice?
| auto MergedProps = | ||
| sycl::ext::oneapi::experimental::detail::merge_properties( | ||
| ExtraProps, | ||
| KernelFunc.get(ext::oneapi::experimental::properties_tag{})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it use-after-move (inside the call on line 3950)?
Thanks for the review. Yes, the PR can be split into two (or more) PRs. |
| detail::CG::StorageInitHelper CGData; | ||
| std::unique_lock<std::mutex> Lock(MMutex); | ||
|
|
||
| NestedCallsTracker tracker; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it is a direct submit, is nesting actually possible?
As I understand, nesting is possible when the user submits the CGFO object, and inside CGFO object, the submit is called again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the case:
| q.parallel_for<class zero>(sycl::range<1>{n}, |
| ?HasAssociatedAccessor@handler@_V1@sycl@@AEBA_NPEAVAccessorImplHost@detail@23@W4target@access@23@@Z | ||
| ?PushBack@exception_list@_V1@sycl@@AEAAX$$QEAVexception_ptr@std@@@Z | ||
| ?PushBack@exception_list@_V1@sycl@@AEAAXAEBVexception_ptr@std@@@Z | ||
| ?RangeRoundingTrace@detail@_V1@sycl@@YA_NXZ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to export both: detail::RangeRoundingTrace() and handler::RangeRoundingTrace()?
|
|
||
| using namespace sycl::detail; | ||
|
|
||
| #ifndef __INTEL_PREVIEW_BREAKING_CHANGES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are in the ABI breaking window. Please clean up ABI related to this PR.
Extend the handler-less kernel submission path to sycl::range based functions. This includes: