Skip to content

Add ROCM kernel skill#343

Merged
sayakpaul merged 14 commits into
huggingface:mainfrom
01xjw:add-rocm-kernels-skill
Apr 15, 2026
Merged

Add ROCM kernel skill#343
sayakpaul merged 14 commits into
huggingface:mainfrom
01xjw:add-rocm-kernels-skill

Conversation

@01xjw
Copy link
Copy Markdown
Contributor

@01xjw 01xjw commented Mar 13, 2026

Add ROCm Triton kernels skill for MI355X/R9700

  • RMSNorm, RoPE 3D, GEGLU, AdaLN kernel patterns
  • Benchmark scripts (micro + e2e for LTX-Video)
  • HuggingFace Kernels integration example
  • Reference docs: optimization guides, templates, troubleshooting

01xjw added 3 commits March 13, 2026 06:43
- RMSNorm, RoPE 3D, GEGLU, AdaLN kernel patterns
- Benchmark scripts (micro + e2e for LTX-Video)
- HuggingFace Kernels integration example
- Reference docs: optimization guides, templates, troubleshooting
@01xjw 01xjw changed the title Add ROCM kernel skill [Draft]Add ROCM kernel skill Mar 13, 2026
@01xjw 01xjw marked this pull request as draft March 13, 2026 11:55
@01xjw 01xjw marked this pull request as ready for review March 16, 2026 07:56
@01xjw 01xjw changed the title [Draft]Add ROCM kernel skill Add ROCM kernel skill Mar 16, 2026
@sayakpaul
Copy link
Copy Markdown
Member

Cc: @burtenshaw @danieldk

@burtenshaw
Copy link
Copy Markdown
Contributor

Really cool PR! Could you please share a trace with a coding harness like claude code, codex, or opencode.

sayakpaul
sayakpaul previously approved these changes Mar 24, 2026
Copy link
Copy Markdown
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this 🔥

I will let @burtenshaw do the final approval. Some comments:

@@ -0,0 +1,252 @@
# Diffusers Pipeline Integration Guide (ROCm)

Integrating custom Triton kernels into HuggingFace diffusers pipelines on AMD GPUs.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also enlist any dependencies?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, I will add some dependencies in 24 hours!

@Abdennacer-Badaoui
Copy link
Copy Markdown
Member

This is really good 🔥 Thanks @01xjw

@01xjw
Copy link
Copy Markdown
Contributor Author

01xjw commented Mar 25, 2026

Really cool PR! Could you please share a trace with a coding harness like claude code, codex, or opencode.

Hi @burtenshaw, the show results are in the blog PR, could you help review it also? Thanks ~
huggingface/blog#3308

@01xjw
Copy link
Copy Markdown
Contributor Author

01xjw commented Mar 25, 2026

Thanks for this 🔥

I will let @burtenshaw do the final approval. Some comments:

OK — I’ll add the ROCm kernel skills to this repo following the CLI skills docs.
I’ve already shared the results in the blog PR. Would you like the video or those results included in this PR as well? If so, I’ll attach them here.
huggingface/blog#3308

sayakpaul and others added 2 commits March 25, 2026 10:58
- add formal baseline vs triton numbers (+ compile reference), with consolidated  and example outputs
- provide independent live harness traces (Codex/OpenCode) from real benchmark runs instead of replayed values
- unify reviewer-facing docs and paths, and clarify setup via
@01xjw
Copy link
Copy Markdown
Contributor Author

01xjw commented Apr 10, 2026

Hi @sayakpaul , We added benchmark evidence under skills/rocm-kernels/examples/ltx-video-benchmark/.

Hi @burtenshaw , We included live-run harness traces with executed command/config/results:
trace/codex_live/codex_trace.json
trace/opencode_live/opencode_trace_result.json
Both are marked live_benchmark: true.
And the Dependencies are consolidated in:
python -m pip install -r skills/rocm-kernels/scripts/requirements.txt

Could you please review it? Let me know if you have any questions. Thanks!

- add  to  with supported values  (default) and
- update skill installer to resolve manifest/files by selected skill id
- keep backward compatibility by preserving default behavior for existing  installs
- update  to mention ROCm support and add usage example
@sayakpaul
Copy link
Copy Markdown
Member

@burtenshaw a gentle ping.

Copy link
Copy Markdown
Contributor

@burtenshaw burtenshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my delay here. In general it looks good. I would just drop the video files and extra agent traces.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would drop this and other vider artefacts. It will be inconvenient to vendor and not aid the agent meaningfully.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, I'll drop these videos and extra trace

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just just have one good trace. No need to differentiate opencode and codex.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion — agreed.
We simplified the reviewer package to keep a single high-quality trace only (OpenCode), and removed Codex-specific trace artifacts.

@sayakpaul
Copy link
Copy Markdown
Member

Just one suggestion. Do we want to ensure the MP4s are still included through external links (perhaos you could host them on the Hugging Face Hub)?

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sayakpaul sayakpaul merged commit 766d3f4 into huggingface:main Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants