Add ROCm/HIP compatibility to CuMesh, enabling all 3 extensions by ZJLi2013 · Pull Request #30 · JeffreyXiang/CuMesh

ZJLi2013 · 2026-04-08T05:37:12Z

Add ROCm/HIP compatibility to CuMesh, enabling all 3 extensions (cumesh._C, cumesh._cubvh, cumesh._cumesh_xatlas) to compile and run on AMD GPUs.

Changes (4 files)

src/atlas.cu: Replace cuda::std::plus() with portable cub::Sum() — the CCCL cuda::std header is not available on HIP
src/clean_up.cu: Use rocprim::tuple for int3_decomposer on HIP via #ifdef — cuda::std::tuple and thrust::tuple are both unavailable/broken on ROCm 6.4
src/dtypes.cuh: Add __host__ qualifier to all Vec3f and QEM methods — hipCUB's DeviceSegmentedReduce template instantiation requires host-callable constructors
setup.py: Guard cubvh-specific nvcc flags (--extended-lambda, -U__CUDA_NO_HALF_*) behind IS_HIP check — these flags are NVIDIA-specific and cause errors with hipcc

What works

All CUB device algorithms (RadixSort, Scan, Select, Reduce, ReduceByKey, SegmentedReduce) are automatically converted to hipCUB by PyTorch's hipify
All CUDA runtime APIs (cudaMalloc, cudaMemcpy, etc.) are automatically converted to HIP equivalents
cubvh submodule (half precision + Eigen) compiles without additional changes
xatlas (CPU-only) is unaffected

What is NOT covered

No functional changes to CUDA code paths — all changes are additive #ifdef guards or portable replacements
No new dependencies introduced

Usage on AMD GPUs

git clone --recursive https://github.com/JeffreyXiang/CuMesh.git
cd CuMesh
GPU_ARCHS=gfx942 pip install . --no-build-isolation

Test Environment

GPU: AMD Instinct MI300X (gfx942)
ROCm: 6.4.3
PyTorch: 2.6.0
Docker: rocm/pytorch:rocm6.4.3_ubuntu24.04_py3.12_pytorch_release_2.6.0

Test	Status
Compile all 3 extensions (_C, _cubvh, _xatlas)	✅
Import all modules	✅
Mesh init + read (vertex/face roundtrip)	✅
Compute face normals	✅
Compute vertex normals	✅
Mesh simplification (4→2 faces)	✅
Remove duplicate faces	✅

…lags Made-with: Cursor

Copilot

Pull request overview

This PR adds ROCm/HIP compatibility to CuMesh so that all three compiled extensions can build and run on AMD GPUs, primarily by replacing CUDA-only types/APIs with portable alternatives and gating NVIDIA-specific build flags.

Changes:

Use cub::Sum() in atlas.cu to avoid relying on CCCL cuda::std headers on HIP.
Introduce a HIP-specific int3_decomposer implementation using rocprim::tuple for radix sort key decomposition.
Make Vec3f/QEM methods host-callable to satisfy hipCUB template instantiation requirements; gate NVIDIA-only nvcc flags for cubvh in setup.py.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
`src/dtypes.cuh`	Adds `__host__` qualifiers to vector/QEM methods to support hipCUB instantiation constraints.
`src/clean_up.cu`	Adds HIP-specific tuple-based decomposer for `int3` key decomposition during radix sort.
`src/atlas.cu`	Uses `cub::Sum()` unconditionally for reduce-by-key operator portability.
`setup.py`	Skips NVIDIA-only `nvcc` flags when building under HIP/ROCm.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+#if defined(__HIP_PLATFORM_AMD__)
+#include <rocprim/types/tuple.hpp>
+struct int3_decomposer
+{
+    __host__ __device__ ::rocprim::tuple<int&, int&, int&> operator()(int3& key) const
+    {
+        return ::rocprim::tuple<int&, int&, int&>{key.x, key.y, key.z};


+__host__ __device__ __forceinline__ Vec3f Vec3f::normalized() const {
    float inv_norm = rsqrtf(x * x + y * y + z * z);
    return Vec3f(x * inv_norm, y * inv_norm, z * inv_norm);
 }


-__device__ __forceinline__ void Vec3f::normalize() {
+__host__ __device__ __forceinline__ void Vec3f::normalize() {
    float inv_norm = rsqrtf(x * x + y * y + z * z);
    x *= inv_norm;


ROCm/HIP support for AMD GPUs: fix cuda::std, dtypes __host__, nvcc f…

81b0b06

…lags Made-with: Cursor

JeffreyXiang requested a review from Copilot May 9, 2026 02:24

Copilot started reviewing on behalf of JeffreyXiang May 9, 2026 02:24 View session

Copilot AI reviewed May 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ROCm/HIP compatibility to CuMesh, enabling all 3 extensions#30

Add ROCm/HIP compatibility to CuMesh, enabling all 3 extensions#30
ZJLi2013 wants to merge 1 commit into
JeffreyXiang:mainfrom
PhysicalAI-AIM:rocm

ZJLi2013 commented Apr 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ZJLi2013 commented Apr 8, 2026

Changes (4 files)

What works

What is NOT covered

Usage on AMD GPUs

Test Environment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants