-
Notifications
You must be signed in to change notification settings - Fork 294
Refactor driver APIs #6632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Refactor driver APIs #6632
Conversation
5285c51 to
f19d5e6
Compare
| { | ||
| static auto __driver_fn = _CCCLRT_GET_DRIVER_FUNCTION(cuDeviceGetCount); | ||
| int __result; | ||
| ::cuda::__driver::__call_driver_fn(__driver_fn, "Failed to get device count", &__result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this implementation, we are loosing the API name and error messages. I am wondering whether we could just add a __api_ and __error_msg_ members to the __call_result structure and use it in __check_and_extract_result function, it should be really easy for the compiler to optimize it out
|
ok, now I'm confused. In another PR you suggested me to avoid |
|
Non-blocking comments:
|
... returns
I don't think the file to auto generate the header would contain less code, because we still want to adapt the API to our needs.
We store the function addresses in |
It is about avoiding hand-writing Hand-writing bindings are error prone and not scalable as we expand the project scope and coverage.
Function-local |
This comment has been minimized.
This comment has been minimized.
| const auto __needs_load = ::cuda::__driver::__version_at_least(12, 4); | ||
| if (__needs_load.__error_ != ::CUDA_SUCCESS) | ||
| { | ||
| return static_cast<::cudaError_t>(__needs_load.__error_); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these not _CCCL_TRY_DRIVER_API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a question for @pciolkosz, all of the functions applying the launch options to a given kernel launch are marked as noexcept
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a leftover consequence of an early design, where I had only the top level interface throwing and internal function always noexcept in case we wanted to implement a exception-free alternative top level interface. But we decided its not a goal anymore, so we could just throw here
libcudacxx/test/libcudacxx/cuda/ccclrt/device/device_smoke.c2h.cu
Outdated
Show resolved
Hide resolved
|
|
||
| template <class _Tp> | ||
| [[nodiscard]] _CCCL_HOST_API constexpr _Tp __check_and_extract_result( | ||
| __call_result<_Tp> __result, ::cuda::std::source_location __loc = ::cuda::std::source_location::current()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question should this take by
| __call_result<_Tp> __result, ::cuda::std::source_location __loc = ::cuda::std::source_location::current()) | |
| __call_result<_Tp>&& __result, ::cuda::std::source_location __loc = ::cuda::std::source_location::current()) |
and forward_like below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the types returned by the driver APIs are trivial, so I don't think it makes any difference
| { | ||
| ::CUresult __error_; | ||
| _Tp __value_; | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is missing at least one feature
| }; | |
| _CCCL_API constexpr operator bool() const noexcept { | |
| return __error_ == ::CUDA_SUCCESS; | |
| } | |
| }; |
We might also want to think about at least one monadic interface:
| }; | |
| _CCCL_API constexpr _Tp or_else(_Tp __alternative) const noexcept { | |
| if (__error_ == ::CUDA_SUCCESS) { | |
| return __value_; | |
| } | |
| else | |
| { | |
| return __alternative; | |
| } | |
| } | |
| }; |
This would be handy if there is a fallible API with a sensible default value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree. I don't think the operator bool does make things more clean. Examples from the code:
- Not ideally named variables
const auto __needs_load = _CCCL_TRY_DRIVER_API(__version_at_least(12, 4));
if (!__needs_load)
{
return __needs_load.__error_;
}- What's actually being checked here?
const auto __mem_type = ::cuda::__driver::__pointerGetAttribute<::CU_POINTER_ATTRIBUTE_MEMORY_TYPE>(__p1);
return __mem_type || __mem_type.__value_ == ::CU_MEMORYTYPE_HOST;I expect us to use the macros in 99% of the time. I would suggest just using the result.__error_ and result.__value_ members directly for other cases.
| : stream_ref(::cuda::__invalid_stream()) | ||
| { | ||
| [[maybe_unused]] __ensure_current_context __ctx_setter(__dev); | ||
| __stream = ::cuda::__driver::__streamCreateWithPriority(cudaStreamNonBlocking, __priority); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor concern. This changes the original behavior of the driver call wrt. exceptions. A developer could still think that direct driver calls throw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true. On the other a developer using cuda::__driver::__meow(...) call in a noexcept function thinking it just does what cuMeow(...) would do is even worse in my opinion
42369e5 to
c3bffd4
Compare
This comment has been minimized.
This comment has been minimized.
c3bffd4 to
1fd6dfc
Compare
This comment has been minimized.
This comment has been minimized.
6e30500 to
68b855e
Compare
This comment has been minimized.
This comment has been minimized.
🥳 CI Workflow Results🟩 Finished in 2h 10m: Pass: 100%/120 | Total: 1d 15h | Max: 1h 25m | Hits: 91%/231199See results here. |
Currently, we have 2 types of CUDA Driver calls:
__meow(...)and__meowNoThrow(...)I don't like the design, it forces us to repeat the implementations and forces us to have in/out parameters.
This PR refactors the driver APIs and keeps only 1 definition for each API. Each function is
noexceptand returnsstd::expected-like type that returns the error + return values.In case that throwing behaviour is demanded,
_CCCL_TRY_DRIVER_APImacro is provided, which checks the return value of the CUDA call and returns the output parameters. Alternatively,_CCCL_ASSERT_DRIVER_APIcan be used to assert that a call succeeded.