Summary
There is a bug in rustc_codegen_nvvm/src/abi.rs where arrays passed by value to CUDA kernels discard their underlying alignment metadata, forcing them into PassMode::Direct(ArgAttributes::new()).
This presents two distinct issues:
- Lost Alignment Metadata (
IllegalAddress traps): If you pass an array of 16-byte aligned elements (like [u128; 4] or [AlignedStruct; 2]), the compiler forces it to PassMode::Direct and generates PTX expecting 8-byte boundaries (e.g., .param .align 8 .b8 param_0[32]). However, the Host driver packs the array at a 16-byte boundary. If the kernel subsequently attempts an aligned/vectorized 128-bit load, it can trap with an IllegalAddress error.
- Internal Compiler Error (ICE) for >16-byte alignments: Because arrays bypass the 16-byte
PassMode::Cast workaround that is correctly applied to ADTs, passing an array requiring 32-byte or greater alignment (e.g., [AlignedStruct32; 2]) triggers an ICE in the upstream NVPTX ABI handling: internal error: entered unreachable code: Align is given as power of 2 no larger than 16 bytes.
Reproduction Case
Define an aligned struct and pass an array of them to a kernel:
#[derive(Copy, Clone)]
#[repr(C, align(16))]
pub struct AlignedStruct {
a: u64,
b: u64,
}
#[cuda_std::kernel]
pub unsafe fn array_kernel2(arr: [AlignedStruct; 2], out: *mut u128) {
// PTX generates: .param .align 8 .b8 array_kernel2_param_0[32]
// The missing 16-byte alignment metadata leads to an ABI mismatch
// between the host and device.
unsafe {
*out = arr[0].a as u128 + arr[1].a as u128;
}
}
If the struct is changed to #[repr(C, align(32))], compiling the kernel causes the compiler to panic with internal error: entered unreachable code: Align is given as power of 2 no larger than 16 bytes.
Issue Details
In crates/rustc_codegen_nvvm/src/abi.rs, the function readjust_fn_abi handles arrays blindly without preserving or checking arg.layout.align.abi:
// Current array handling in abi.rs
if arg.layout.ty.is_array() && !matches!(arg.mode, PassMode::Direct { .. }) {
arg.mode = PassMode::Direct(ArgAttributes::new());
}
Unlike the ADT fallback branch (attrs.pointee_align = Some(arg.layout.align.abi)), the array block throws away all alignment metadata. It also completely misses the PassMode::Cast workaround needed for types with align >= 16.
Summary
There is a bug in
rustc_codegen_nvvm/src/abi.rswhere arrays passed by value to CUDA kernels discard their underlying alignment metadata, forcing them intoPassMode::Direct(ArgAttributes::new()).This presents two distinct issues:
IllegalAddresstraps): If you pass an array of 16-byte aligned elements (like[u128; 4]or[AlignedStruct; 2]), the compiler forces it toPassMode::Directand generates PTX expecting 8-byte boundaries (e.g.,.param .align 8 .b8 param_0[32]). However, the Host driver packs the array at a 16-byte boundary. If the kernel subsequently attempts an aligned/vectorized 128-bit load, it can trap with anIllegalAddresserror.PassMode::Castworkaround that is correctly applied to ADTs, passing an array requiring 32-byte or greater alignment (e.g.,[AlignedStruct32; 2]) triggers an ICE in the upstream NVPTX ABI handling:internal error: entered unreachable code: Align is given as power of 2 no larger than 16 bytes.Reproduction Case
Define an aligned struct and pass an array of them to a kernel:
If the struct is changed to
#[repr(C, align(32))], compiling the kernel causes the compiler to panic withinternal error: entered unreachable code: Align is given as power of 2 no larger than 16 bytes.Issue Details
In
crates/rustc_codegen_nvvm/src/abi.rs, the functionreadjust_fn_abihandles arrays blindly without preserving or checkingarg.layout.align.abi:Unlike the ADT fallback branch (
attrs.pointee_align = Some(arg.layout.align.abi)), the array block throws away all alignment metadata. It also completely misses thePassMode::Castworkaround needed for types withalign >= 16.