Skip to content

Refactor(GPU): switch to per-stream synchronization in CUDA and ROCm backend#699

Closed
victorcamaraa wants to merge 4 commits intoJuliaParallel:masterfrom
victorcamaraa:fix-per-stream-sync
Closed

Refactor(GPU): switch to per-stream synchronization in CUDA and ROCm backend#699
victorcamaraa wants to merge 4 commits intoJuliaParallel:masterfrom
victorcamaraa:fix-per-stream-sync

Conversation

@victorcamaraa
Copy link
Copy Markdown

This PR is the first step toward implementing multi-stream capabilities in Dagger.jl; it implements per-stream synchronization in both the CUDA and ROCm backends.

Changes

  • ext/CUDAExt.jl: CUDA.synchronize()CUDA.synchronize(stream())
    in _sync_with_context and gpu_synchronize(proc::CuArrayDeviceProc)
  • ext/ROCExt.jl: same change for the ROCm backend

Validated locally on a dual-GPU machine (NVIDIA GTX 1060 6GB + AMD
RX 6650 XT) running Ubuntu 24.04

@victorcamaraa victorcamaraa reopened this Apr 16, 2026
@victorcamaraa victorcamaraa deleted the fix-per-stream-sync branch April 17, 2026 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant