Skip to content

Conversation

@slawekptak
Copy link
Contributor

@slawekptak slawekptak commented Nov 24, 2025

Extend the handler-less kernel submission path to sycl::range based functions. This includes:

  1. A refactor of range rounding utility functions, so they can be shared between a handler and a queue
  2. The actual sycl::range based submission, with range rounding logic
  3. Modifications of range rounding e2e tests, which can now test both handler-based and handler-less functions.

@slawekptak slawekptak marked this pull request as ready for review November 25, 2025 11:32
@slawekptak slawekptak requested a review from a team as a code owner November 25, 2025 11:32
@aelovikov-intel
Copy link
Contributor

A refactor of range rounding utility functions, so they can be shared between a handler and a queue

This can/should go to a separate NFC PR.


device get_device() const;

#ifndef __INTEL_PREVIEW_BREAKING_CHANGES
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're in ABI breaking window, just drop it.


bool eventNeeded() const;

device get_device() const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an extra std::shared_ptr copy. https://github.com/intel/llvm/pull/20698/files#r2561353949 is related. Can you collaborate with @lbushi25 for a better fix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. I am planning to create a separate PR with a refactor to avoid this.

Comment on lines +288 to +293
template <int Dims, typename LambdaArgType> struct TransformUserItemType {
using type = std::conditional_t<
std::is_convertible_v<nd_item<Dims>, LambdaArgType>, nd_item<Dims>,
std::conditional_t<std::is_convertible_v<item<Dims>, LambdaArgType>,
item<Dims>, LambdaArgType>>;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Example usage:
// using mapped = map_type<type_to_map, from0, /*->*/ to0,
// from1, /*->*/ to1,
// ...>
template <typename...> struct map_type {
using type = void;
};
template <typename T, typename From, typename To, typename... Rest>
struct map_type<T, From, To, Rest...> {
using type = std::conditional_t<std::is_same_v<From, T>, To,
typename map_type<T, Rest...>::type>;
};
might be helpful.

friend class ext::oneapi::experimental::detail::dynamic_parameter_impl;
friend class ext::oneapi::experimental::detail::dynamic_command_group_impl;

#ifndef __INTEL_PREVIEW_BREAKING_CHANGES
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just drop it.

Comment on lines +3762 to +3763
using KernelType = std::remove_const_t<
std::remove_reference_t<std::tuple_element_t<0, std::tuple<RestT...>>>>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as in #20756 (review) - use detail::nth_type_t, it's much better for compile time.

But also, this doesn't make sense because of line 3755 (AreAllButLastReductions check). Did you mean to access sizeof...(RestT) - 1's element?

detail::tls_code_loc_t TlsCodeLocCapture(CodeLoc);

using KernelType =
std::remove_const_t<std::remove_reference_t<KernelTypeUniversalRef>>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would std::decay_t suffice?

auto MergedProps =
sycl::ext::oneapi::experimental::detail::merge_properties(
ExtraProps,
KernelFunc.get(ext::oneapi::experimental::properties_tag{}));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it use-after-move (inside the call on line 3950)?

@slawekptak
Copy link
Contributor Author

A refactor of range rounding utility functions, so they can be shared between a handler and a queue

This can/should go to a separate NFC PR.

Thanks for the review. Yes, the PR can be split into two (or more) PRs.

detail::CG::StorageInitHelper CGData;
std::unique_lock<std::mutex> Lock(MMutex);

NestedCallsTracker tracker;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is a direct submit, is nesting actually possible?
As I understand, nesting is possible when the user submits the CGFO object, and inside CGFO object, the submit is called again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the case:

q.parallel_for<class zero>(sycl::range<1>{n},

?HasAssociatedAccessor@handler@_V1@sycl@@AEBA_NPEAVAccessorImplHost@detail@23@W4target@access@23@@Z
?PushBack@exception_list@_V1@sycl@@AEAAX$$QEAVexception_ptr@std@@@Z
?PushBack@exception_list@_V1@sycl@@AEAAXAEBVexception_ptr@std@@@Z
?RangeRoundingTrace@detail@_V1@sycl@@YA_NXZ
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to export both: detail::RangeRoundingTrace() and handler::RangeRoundingTrace()?


using namespace sycl::detail;

#ifndef __INTEL_PREVIEW_BREAKING_CHANGES
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are in the ABI breaking window. Please clean up ABI related to this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants