Fix nvptx Kernel return type lowering by BI71317 · Pull Request #771 · exaloop/codon

BI71317 · 2026-03-09T01:01:49Z

Fixes #770

Changes

detect NVPTX kernel entry functions during lowering
emit an explicit void return type for final kernel entry functions
preserve existing higher-level semantics before final kernel lowering

Validation

verified that NVPTX kernel entry functions are now emitted with explicit void return types
verified that the resulting kernels still compile and execute successfully

Observed IR

define dso_local void @hello_0_0_std_internal_types_array_List_0_int__std_internal_types_array_List_0_int__std_internal_types_array_List_0_int__(ptr nocapture readonly %0, ptr nocapture readonly %1, ptr nocapture readonly %2) local_unnamed_addr #1 {
...
  ret void
}
...

cla-bot · 2026-03-09T01:01:52Z

Thank you for your pull request. We require contributors to agree to our Contributor License Agreement (https://exaloop.io/legal/cla), and we don't have @BI71317 on file. In order for us to review and merge your code, please email info@exaloop.io to get yourself added.

BI71317 · 2026-03-09T02:11:08Z

I’ve already signed the CLA and just sent a quick note to info@exaloop.io with my GitHub username and PR link 🙂

arshajii · 2026-03-09T17:27:59Z

@cla-bot recheck

arshajii · 2026-03-09T17:33:51Z

Thanks for the PR! I think the better way to do this would be at the LLVM level during NVPTX codegen (see gpu.cpp). Specifically, we can have an LLVM transformation that converts kernel return types to void and updates ret instructions appropriately.

Currently the Codon IR void type is basically unused and might be phased out in the near future, so best to avoid it if possible. Happy to suggest specific changes / help out with this change as needed.

BI71317 · 2026-03-10T02:54:48Z

Got it, thanks for the guidance. I’ve reworked the change so that instead of converting kernel return types to void at the high-level Codon IR stage, the IR is now rewritten in gpu.cpp during applyGPUTransformations before being passed to NVPTX codegen.

Minimal Reproducer

import numpy as np
import gpu
a = np.arange(16)
b = np.arange(16) * 2
c = np.empty(16, dtype=int)

@gpu.kernel 
def vadd(a, b, c, n):
    i = gpu.thread.x
    # i = ocl.thread.x
    if i < n:
        c[i] = a[i] + b[i]

vadd(a, b, c, 16, grid=1, block=16)
print(a)
print(b)
print(c)

Observed IR

; ModuleID = 'codon'
source_filename = "/home/swchoi/src/test_code/codon_gpu_programming/vadd_np.codon"
target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64"
target triple = "nvptx64-nvidia-cuda"

; Function Attrs: mustprogress nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare noundef i32 @llvm.nvvm.read.ptx.sreg.tid.x() #0

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(readwrite, inaccessiblemem: write)
define dso_local void @"vadd.0:0[std.numpy.ndarray.ndarray.0[int,1],std.numpy.ndarray.ndarray.0[int,1],std.numpy.ndarray.ndarray.0[int,1],int]"({ { i64 }, { i64 }, ptr } %0, { { i64 }, { i64 }, ptr } %1, { { i64 }, { i64 }, ptr } %2, i64 %3) local_unnamed_addr #1 {
entry:
  %res.i.i = tail call range(i32 0, 1024) i32 @llvm.nvvm.read.ptx.sreg.tid.x()
  %4 = zext nneg i32 %res.i.i to i64
  %tmp.i32 = icmp sgt i64 %3, %4
  br i1 %tmp.i32, label %if.true, label %if.exit

if.true:                                          ; preds = %entry
  %.fca.2.extract.i.i.i = extractvalue { { i64 }, { i64 }, ptr } %0, 2
  %5 = extractvalue { { i64 }, { i64 }, ptr } %0, 0
  %6 = extractvalue { { i64 }, { i64 }, ptr } %0, 1
  %.fca.0.extract.i.i.i.i = extractvalue { i64 } %5, 0
  %tmp.i18.not.i = icmp sgt i64 %.fca.0.extract.i.i.i.i, %4
  tail call void @llvm.assume(i1 %tmp.i18.not.i)
  %.fca.0.extract.i99.i.i.i = extractvalue { i64 } %6, 0
  %tmp.i14.i.i.i.i = mul i64 %.fca.0.extract.i99.i.i.i, %4
  %7 = getelementptr i8, ptr %.fca.2.extract.i.i.i, i64 %tmp.i14.i.i.i.i
  %8 = load i64, ptr %7, align 4
  %.fca.2.extract.i.i.i33 = extractvalue { { i64 }, { i64 }, ptr } %1, 2
  %9 = extractvalue { { i64 }, { i64 }, ptr } %1, 0
  %10 = extractvalue { { i64 }, { i64 }, ptr } %1, 1
  %.fca.0.extract.i.i.i.i34 = extractvalue { i64 } %9, 0
  %tmp.i18.not.i1 = icmp sgt i64 %.fca.0.extract.i.i.i.i34, %4
  tail call void @llvm.assume(i1 %tmp.i18.not.i1)
  %.fca.0.extract.i99.i.i.i35 = extractvalue { i64 } %10, 0
  %tmp.i14.i.i.i.i36 = mul i64 %.fca.0.extract.i99.i.i.i35, %4
  %11 = getelementptr i8, ptr %.fca.2.extract.i.i.i33, i64 %tmp.i14.i.i.i.i36
  %12 = load i64, ptr %11, align 4
  %tmp.i = add i64 %12, %8
  %.fca.2.extract109.i.i.i = extractvalue { { i64 }, { i64 }, ptr } %2, 2
  %13 = extractvalue { { i64 }, { i64 }, ptr } %2, 0
  %14 = extractvalue { { i64 }, { i64 }, ptr } %2, 1
  %.fca.0.extract.i.i.i.i37 = extractvalue { i64 } %13, 0
  %tmp.i18.not.i2 = icmp sgt i64 %.fca.0.extract.i.i.i.i37, %4
  tail call void @llvm.assume(i1 %tmp.i18.not.i2)
  %.fca.0.extract.i141.i.i.i = extractvalue { i64 } %14, 0
  %tmp.i14.i.i.i.i38 = mul i64 %.fca.0.extract.i141.i.i.i, %4
  %15 = getelementptr i8, ptr %.fca.2.extract109.i.i.i, i64 %tmp.i14.i.i.i.i38
  store i64 %tmp.i, ptr %15, align 4
  br label %if.exit

if.exit:                                          ; preds = %if.true, %entry
  ret void
}

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write)
declare void @llvm.assume(i1 noundef) #2

attributes #0 = { mustprogress nocallback nofree nosync nounwind speculatable willreturn memory(none) }
attributes #1 = { mustprogress nofree norecurse nosync nounwind willreturn memory(readwrite, inaccessiblemem: write) "frame-pointer"="none" "kernel" "target-cpu"="meteorlake" "target-features"="+prfchw,-cldemote,+avx,+aes,+sahf,+pclmul,-xop,+crc32,-amx-fp8,+xsaves,-avx512fp16,-usermsr,-sm4,-egpr,+sse4.1,-avx512ifma,+xsave,+sse4.2,-tsxldtrk,-sm3,-ptwrite,-widekl,-movrs,+invpcid,+64bit,+xsavec,-avx10.1-512,-avx512vpopcntdq,+cmov,-avx512vp2intersect,-avx512cd,+movbe,-avxvnniint8,-ccmp,-amx-int8,-kl,-avx10.1-256,-sha512,+avxvnni,-rtm,+adx,+avx2,-hreset,+movdiri,+serialize,+vpclmulqdq,-avx512vl,-uintr,-cf,+clflushopt,-raoint,-cmpccxadd,+bmi,-amx-tile,+sse,-avx10.2-256,+gfni,-avxvnniint16,-amx-fp16,-zu,-ndd,+xsaveopt,+rdrnd,-avx512f,-amx-bf16,-avx512bf16,-avx512vnni,-push2pop2,+cx8,-avx512bw,+sse3,-pku,-nf,-amx-tf32,-amx-avx512,+fsgsbase,-clzero,-mwaitx,-lwp,+lzcnt,+sha,+movdir64b,-ppx,-wbnoinvd,-enqcmd,-amx-transpose,-avx10.2-512,-avxneconvert,-tbm,-pconfig,-amx-complex,+ssse3,+cx16,+bmi2,+fma,+popcnt,-avxifma,+f16c,-avx512bitalg,-rdpru,+clwb,+mmx,+sse2,+rdseed,-avx512vbmi2,-prefetchi,-amx-movrs,+rdpid,-fma4,-avx512vbmi,+shstk,+vaes,+waitpkg,-sgx,+fxsr,-avx512dq,-sse4a" }
attributes #2 = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) }

!llvm.module.flags = !{!0}
!nvvm.annotations = !{!1}
!llvm.ident = !{!2}
!nvvmir.version = !{!3}

!0 = !{i32 2, !"Debug Info Version", i32 3}
!1 = !{ptr @"vadd.0:0[std.numpy.ndarray.ndarray.0[int,1],std.numpy.ndarray.ndarray.0[int,1],std.numpy.ndarray.ndarray.0[int,1],int]", !"kernel", i32 1}
!2 = !{!"clang version 3.8.0 (tags/RELEASE_380/final)"}
!3 = !{i32 2, i32 0}

BI71317 · 2026-03-17T00:03:18Z

Hi! I left an update on this PR a little while ago, but it may have slipped through the cracks.

I reworked the implementation based on the earlier feedback. Would appreciate another look whenever someone has time. Thanks!

arshajii

Just left a few minor review comments. LGTM overall!

arshajii · 2026-03-23T19:38:13Z

codon/cir/llvm/gpu.cpp

  return std::vector<llvm::GlobalValue *>(keep.begin(), keep.end());
 }

+static bool isEmptyStructType(llvm::Type *ty) {


Don't need static if these are in namespace {}.

arshajii · 2026-03-23T19:38:50Z

codon/cir/llvm/gpu.cpp


+static bool isEmptyStructType(llvm::Type *ty) {
+  auto *st = llvm::dyn_cast<llvm::StructType>(ty);
+  return st && st->getNumElements() == 0;


I believe we should also check !st->hasName().

arshajii · 2026-03-23T19:42:14Z

codon/cir/llvm/gpu.cpp

+}
+
+static llvm::Function *normalizeKernelReturnToVoid(llvm::Function *F) {
+  if (!F || F->isDeclaration())


Can we merge these conditions via || into a single if statement?

arshajii · 2026-03-23T19:43:52Z

codon/cir/llvm/gpu.cpp

+  std::vector<llvm::Function *> kernelCandidates;
  std::vector<llvm::GlobalValue *> kernels;
-
+  


Please format with clang-format --style=file -i codon/cir/llvm/gpu.cpp, which should remove trailing whitespace.

arshajii · 2026-03-23T19:45:08Z

codon/cir/llvm/gpu.cpp

+  return st && st->getNumElements() == 0;
+}
+
+static llvm::Function *normalizeKernelReturnToVoid(llvm::Function *F) {


Similarly, don't need static here.

BI71317 · 2026-03-24T02:38:29Z

Thanks for the review and suggestions. I’ve addressed the requested changes and pushed a new commit.

Fix nvptx Kernel return type lowering

3cb2ac5

BI71317 requested a review from inumanag as a code owner March 9, 2026 01:01

Fix nvptx Kernel Return type Lowering in GPU Codegen

40a4f32

BI71317 requested a review from arshajii as a code owner March 10, 2026 02:46

cla-bot bot added the cla-signed label Mar 10, 2026

arshajii requested changes Mar 23, 2026

View reviewed changes

apply feedback to kernel return normalization

d96d8f0

		std::vector<llvm::Function *> kernelCandidates;
		std::vector<llvm::GlobalValue *> kernels;

Conversation

BI71317 commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Validation

Observed IR

Uh oh!

cla-bot bot commented Mar 9, 2026

Uh oh!

BI71317 commented Mar 9, 2026

Uh oh!

arshajii commented Mar 9, 2026

Uh oh!

arshajii commented Mar 9, 2026

Uh oh!

BI71317 commented Mar 10, 2026

Minimal Reproducer

Observed IR

Uh oh!

BI71317 commented Mar 17, 2026

Uh oh!

arshajii left a comment

Choose a reason for hiding this comment

Uh oh!

arshajii Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

arshajii Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

arshajii Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

arshajii Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

arshajii Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

BI71317 commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BI71317 commented Mar 9, 2026 •

edited

Loading