Skip to content

Add chipStar (SPIR-V) support for HIP backends#1942

Draft
pvelesko wants to merge 6 commits intoCEED:mainfrom
CHIP-SPV:chipStar
Draft

Add chipStar (SPIR-V) support for HIP backends#1942
pvelesko wants to merge 6 commits intoCEED:mainfrom
CHIP-SPV:chipStar

Conversation

@pvelesko
Copy link

@pvelesko pvelesko commented Mar 17, 2026

Purpose:

Add support for chipStar, enabling libCEED HIP backends (/gpu/hip/ref, /gpu/hip/shared, /gpu/hip/gen) to run on any OpenCL or Level Zero GPU (Intel, etc.) via SPIR-V.

Changes

  • Makefile: Detect chipStar HIP platform (__HIP_PLATFORM_SPIRV__), handle dynamic lib name (libCHIP.so vs libamdhip64.so), filter clang-only flags from HIPCONFIG_CPPFLAGS when using gcc as CC, use SYCLCXX as linker when SYCL backend is also enabled
  • HIP kernel syncthreads fix (#ifdef __HIP_PLATFORM_SPIRV__): chipStar SPIR-V target requires all threads in a workgroup to reach __syncthreads() barriers. Kernel element loops are refactored to a single-assignment + guard pattern on chipStar, while preserving the original for-loop structure on AMD ROCm/NVIDIA
    • hip-ref-basis-nontensor.h: 5 kernel functions
    • hip-shared-basis-tensor.h: 13 kernel functions
    • ceed-hip-gen-operator-build.cpp: code-generator emits conditional kernel source
  • Test infrastructure: Handle chipStar runtime stderr warnings in JUnit test runner; use errors='replace' for binary data in subprocess output

Benchmarks: HIP (chipStar) vs SYCL (oneAPI) on Intel Arc A770

chipStar main + hiprtc output caching. oneAPI 2024.2.2 + Level Zero. 5M DOFs, polynomial degree p=4, warm JIT cache.

Testing was done with SYCL Caching PR #1943 otherwise chipStar outperforms SYCL much more due to JIT costs.

ex1-volume (mass operator)

Backend Dim HIP (s) SYCL (s) SYCL/HIP
ref 2D 0.77 1.17 1.52x
shared 2D 0.76 0.99 1.30x
gen 2D 0.61 0.80 1.31x
ref 3D 0.97 1.30 1.34x
shared 3D 0.98 1.24 1.27x
gen 3D 0.81 1.06 1.31x

ex2-surface (surface integral)

Backend Dim HIP (s) SYCL (s) SYCL/HIP
ref 2D 0.88 1.22 1.39x
shared 2D 0.86 1.12 1.30x
gen 2D 0.70 0.91 1.30x
ref 3D 1.20 1.38 1.15x
shared 3D 1.19 1.49 1.25x
gen 3D 1.02 1.29 1.26x

chipStar HIP is 15-52% faster than native SYCL across all backends.

LLM/GenAI Disclosure:

Claude Code was used to rebase on latest main (bulk of this work was done in 2024/2025), write the #ifdef __HIP_PLATFORM_SPIRV__ preprocessor guards across HIP kernel files and the code-generator, and to draft this PR description.

By submitting this PR, the author certifies to its contents as described by the Developer's Certificate of Origin.
Please follow the Contributing Guidelines for all PRs.

Detect HIP platform at build time via hipconfig output:
- __HIP_PLATFORM_SPIRV__ → HIP_LIB_NAME=CHIP (chipStar)
- __HIP_PLATFORM_HCC__/__HIP_PLATFORM_AMD__ → HIP_LIB_NAME=amdhip64

Move ROCM_DIR and HIP_ARCH defaults to the top of the file where other
tool-path defaults live. Use HIP_LIB_NAME in the library detection glob
and in PKG_LIBS. Remove the subst=,, stripping from HIPCONFIG_CPPFLAGS
so flags are passed through unmodified.
When SYCL backends are built, libceed.so must be linked with icpx
(SYCLCXX) rather than g++, and -fsycl must appear in CEED_LDFLAGS
(before object files) so icpx can merge the SYCL fat binary device
sections. Without this, libceed.so lacks NEEDED: libsycl.so.7 and
SYCL kernels fail to load at runtime.
Replace the outer element for-loop with a single element assignment and
guard all memory accesses with if (elem < num_elem). Shared memory
operations (Interp, Grad, etc.) must execute unconditionally across all
threads so __syncthreads() is reached uniformly; only the
load/store steps are guarded. Also guard qfunction calls with
thread-id bounds checks and comment out the pragma unroll that
triggered miscompilation on chipStar's LLVM.
Prevents UnicodeDecodeError when GPU runtimes emit non-UTF-8 bytes
(e.g. chipStar CHIP warnings contain raw binary data in some paths).
chipStar prints 'CHIP info/warning/debug ...' lines to stderr on every
run. These are not test failures but caused JUnit to mark all HIP tests
as failed. Filter them out before checking whether stderr is non-empty.
@jeremylt
Copy link
Member

jeremylt commented Mar 17, 2026

Note: Please use the libCEED PR template, including the LLM usage declaration. Thanks (requirement in CONTRIBUTING.md)

…r gcc

chipStar's hipconfig -C outputs --offload=spirv64, -nohipwrapperinc, --hip-path=,
and --target= which are clang-only flags.  When CC=gcc is used for .c files (or
CXX != HIPCC for .cpp files), these flags cause build failures.

Add HIPCONFIG_CPPFLAGS_C that filters the clang-only flags and adds an explicit
-I$(ROCM_DIR)/include (since -nohipwrapperinc was suppressing the wrapper that
would have pulled in hip_runtime.h).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants