Refactor driver APIs #6632

davebayer · 2025-11-14T16:04:44Z

Currently, we have 2 types of CUDA Driver calls:

__meow(...) and
__meowNoThrow(...)

I don't like the design, it forces us to repeat the implementations and forces us to have in/out parameters.

This PR refactors the driver APIs and keeps only 1 definition for each API. Each function is noexcept and returns std::expected-like type that returns the error + return values.

In case that throwing behaviour is demanded, _CCCL_TRY_DRIVER_API macro is provided, which checks the return value of the CUDA call and returns the output parameters. Alternatively, _CCCL_ASSERT_DRIVER_API can be used to assert that a call succeeded.

davebayer · 2025-11-14T16:26:45Z

libcudacxx/include/cuda/__driver/driver_api.h

 {
  static auto __driver_fn = _CCCLRT_GET_DRIVER_FUNCTION(cuDeviceGetCount);
-  int __result;
-  ::cuda::__driver::__call_driver_fn(__driver_fn, "Failed to get device count", &__result);


With this implementation, we are loosing the API name and error messages. I am wondering whether we could just add a __api_ and __error_msg_ members to the __call_result structure and use it in __check_and_extract_result function, it should be really easy for the compiler to optimize it out

fbusato · 2025-11-14T17:05:41Z

ok, now I'm confused. In another PR you suggested me to avoid std::span and use raw arrays to keep the implementation minimal. In this PR, we are introducing std::expected which is way more complex.

leofang · 2025-11-14T17:21:50Z

Non-blocking comments:

I discussed with @pciolkosz a while back about this, my opinion is that the entire driver_api.h should just be auto-generated. We can already do this same thing for cuda-bindings (here), so this is trivial, and we have a plan to expand it further
it has occurred to me that _CCCLRT_GET_DRIVER_FUNCTION does not cache the retrieved function pointers. There will be a small performance penalty. (It's the reason that we cached them in cuda-bindings.)

davebayer · 2025-11-14T17:43:10Z

ok, now I'm confused. In another PR you suggested me to avoid std::span and use raw arrays to keep the implementation minimal. In this PR, we are introducing std::expected which is way more complex.

... returns std::expected-like type ...

I discussed with @pciolkosz a while back about this, my opinion is that the entire driver_api.h should just be auto-generated.

I don't think the file to auto generate the header would contain less code, because we still want to adapt the API to our needs.

it has occurred to me that _CCCLRT_GET_DRIVER_FUNCTION does not cache the retrieved function pointers

We store the function addresses in static members inside the functions, _CCCLRT_GET_DRIVER_FUNCTION is called only once per program launch.

leofang · 2025-11-14T18:07:09Z

I don't think the file to auto generate the header would contain less code, because we still want to adapt the API to our needs.

It is about avoiding hand-writing driver_api.h in a future-proof, safer fashion. The call sites still needs to be hand-written.

Hand-writing bindings are error prone and not scalable as we expand the project scope and coverage.

We store the function addresses in static members inside the functions, _CCCLRT_GET_DRIVER_FUNCTION is called only once per program launch.

Function-local static variables are what I missed, thanks!

miscco · 2025-11-17T08:30:56Z

cudax/include/cuda/experimental/__launch/configuration.cuh

+  const auto __needs_load = ::cuda::__driver::__version_at_least(12, 4);
+  if (__needs_load.__error_ != ::CUDA_SUCCESS)
+  {
+    return static_cast<::cudaError_t>(__needs_load.__error_);
+  }


Why are these not _CCCL_TRY_DRIVER_API?

That's a question for @pciolkosz, all of the functions applying the launch options to a given kernel launch are marked as noexcept

This is a leftover consequence of an early design, where I had only the top level interface throwing and internal function always noexcept in case we wanted to implement a exception-free alternative top level interface. But we decided its not a goal anymore, so we could just throw here

libcudacxx/test/libcudacxx/cuda/ccclrt/device/device_smoke.c2h.cu

libcudacxx/include/cuda/__driver/driver_api.h

miscco · 2025-11-17T08:35:25Z

libcudacxx/include/cuda/__driver/driver_api.h

+
+template <class _Tp>
+[[nodiscard]] _CCCL_HOST_API constexpr _Tp __check_and_extract_result(
+  __call_result<_Tp> __result, ::cuda::std::source_location __loc = ::cuda::std::source_location::current())


Question should this take by

Suggested change

__call_result<_Tp> __result, ::cuda::std::source_location __loc = ::cuda::std::source_location::current())

__call_result<_Tp>&& __result, ::cuda::std::source_location __loc = ::cuda::std::source_location::current())

and forward_like below?

All of the types returned by the driver APIs are trivial, so I don't think it makes any difference

miscco · 2025-11-17T08:37:16Z

libcudacxx/include/cuda/__driver/driver_api.h

+{
+  ::CUresult __error_;
+  _Tp __value_;
+};


I believe this is missing at least one feature

Suggested change

};

_CCCL_API constexpr operator bool() const noexcept {

return __error_ == ::CUDA_SUCCESS;

}

};

We might also want to think about at least one monadic interface:

Suggested change

};

_CCCL_API constexpr _Tp or_else(_Tp __alternative) const noexcept {

if (__error_ == ::CUDA_SUCCESS) {

return __value_;

}

else

{

return __alternative;

}

}

};

This would be handy if there is a fallible API with a sensible default value

I don't agree. I don't think the operator bool does make things more clean. Examples from the code:

Not ideally named variables

const auto __needs_load = _CCCL_TRY_DRIVER_API(__version_at_least(12, 4)); if (!__needs_load) { return __needs_load.__error_; }

What's actually being checked here?

const auto __mem_type = ::cuda::__driver::__pointerGetAttribute<::CU_POINTER_ATTRIBUTE_MEMORY_TYPE>(__p1); return __mem_type || __mem_type.__value_ == ::CU_MEMORYTYPE_HOST;

I expect us to use the macros in 99% of the time. I would suggest just using the result.__error_ and result.__value_ members directly for other cases.

libcudacxx/include/cuda/__driver/driver_api.h

fbusato · 2025-11-18T00:27:10Z

libcudacxx/include/cuda/__stream/stream.h

      : stream_ref(::cuda::__invalid_stream())
  {
    [[maybe_unused]] __ensure_current_context __ctx_setter(__dev);
-    __stream = ::cuda::__driver::__streamCreateWithPriority(cudaStreamNonBlocking, __priority);


minor concern. This changes the original behavior of the driver call wrt. exceptions. A developer could still think that direct driver calls throw

That's true. On the other a developer using cuda::__driver::__meow(...) call in a noexcept function thinking it just does what cuMeow(...) would do is even worse in my opinion

github-actions · 2025-11-18T19:40:04Z

🥳 CI Workflow Results

🟩 Finished in 2h 10m: Pass: 100%/120 | Total: 1d 15h | Max: 1h 25m | Hits: 91%/231199

See results here.

davebayer requested review from a team as code owners November 14, 2025 16:04

github-project-automation bot added this to CCCL Nov 14, 2025

davebayer requested a review from ericniebler November 14, 2025 16:04

github-project-automation bot moved this to Todo in CCCL Nov 14, 2025

cccl-authenticator-app bot moved this from Todo to In Review in CCCL Nov 14, 2025

davebayer force-pushed the refactor_driver_api branch 2 times, most recently from 5285c51 to f19d5e6 Compare November 14, 2025 16:18

davebayer commented Nov 14, 2025

View reviewed changes

This comment has been minimized.

Sign in to view

miscco requested changes Nov 17, 2025

View reviewed changes

github-project-automation bot moved this from In Review to In Progress in CCCL Nov 17, 2025

fbusato reviewed Nov 18, 2025

View reviewed changes

davebayer force-pushed the refactor_driver_api branch from 42369e5 to c3bffd4 Compare November 18, 2025 11:45

This comment has been minimized.

Sign in to view

davebayer force-pushed the refactor_driver_api branch from c3bffd4 to 1fd6dfc Compare November 18, 2025 14:29

This comment has been minimized.

Sign in to view

Refactor driver APIs

68b855e

davebayer force-pushed the refactor_driver_api branch from 6e30500 to 68b855e Compare November 18, 2025 17:25

This comment has been minimized.

Sign in to view

	__call_result<_Tp> __result, ::cuda::std::source_location __loc = ::cuda::std::source_location::current())
	__call_result<_Tp>&& __result, ::cuda::std::source_location __loc = ::cuda::std::source_location::current())

-};
+  _CCCL_API constexpr operator bool() const noexcept {
+    return __error_ == ::CUDA_SUCCESS;
+  }
+};

-};
+  _CCCL_API constexpr _Tp or_else(_Tp __alternative) const noexcept {
+    if (__error_ == ::CUDA_SUCCESS) {
+      return __value_;
+    }
+    else
+    {
+      return __alternative;
+    }
+  }
+};

Refactor driver APIs #6632

Are you sure you want to change the base?

Refactor driver APIs #6632

Conversation

davebayer commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fbusato commented Nov 14, 2025

Uh oh!

leofang commented Nov 14, 2025

Uh oh!

davebayer commented Nov 14, 2025

Uh oh!

leofang commented Nov 14, 2025

Uh oh!

This comment has been minimized.

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Nov 18, 2025

🥳 CI Workflow Results

🟩 Finished in 2h 10m: Pass: 100%/120 | Total: 1d 15h | Max: 1h 25m | Hits: 91%/231199

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

davebayer commented Nov 14, 2025 •

edited

Loading