Add ROCM kernel skill#343
Conversation
- RMSNorm, RoPE 3D, GEGLU, AdaLN kernel patterns - Benchmark scripts (micro + e2e for LTX-Video) - HuggingFace Kernels integration example - Reference docs: optimization guides, templates, troubleshooting
|
Cc: @burtenshaw @danieldk |
|
Really cool PR! Could you please share a trace with a coding harness like claude code, codex, or opencode. |
sayakpaul
left a comment
There was a problem hiding this comment.
Thanks for this 🔥
I will let @burtenshaw do the final approval. Some comments:
- https://huggingface.co/docs/kernels/main/en/cli-skills should be likely modified to mention that ROCm kernels are also supported.
- Could we also see some numbers with and without these kernels, and preferably some videos?
| @@ -0,0 +1,252 @@ | |||
| # Diffusers Pipeline Integration Guide (ROCm) | |||
|
|
|||
| Integrating custom Triton kernels into HuggingFace diffusers pipelines on AMD GPUs. | |||
There was a problem hiding this comment.
Should we also enlist any dependencies?
There was a problem hiding this comment.
No problem, I will add some dependencies in 24 hours!
|
This is really good 🔥 Thanks @01xjw |
Hi @burtenshaw, the show results are in the blog PR, could you help review it also? Thanks ~ |
OK — I’ll add the ROCm kernel skills to this repo following the CLI skills docs. |
- add formal baseline vs triton numbers (+ compile reference), with consolidated and example outputs - provide independent live harness traces (Codex/OpenCode) from real benchmark runs instead of replayed values - unify reviewer-facing docs and paths, and clarify setup via
|
Hi @sayakpaul , We added benchmark evidence under skills/rocm-kernels/examples/ltx-video-benchmark/. Hi @burtenshaw , We included live-run harness traces with executed command/config/results: Could you please review it? Let me know if you have any questions. Thanks! |
- add to with supported values (default) and - update skill installer to resolve manifest/files by selected skill id - keep backward compatibility by preserving default behavior for existing installs - update to mention ROCm support and add usage example
|
@burtenshaw a gentle ping. |
burtenshaw
left a comment
There was a problem hiding this comment.
Sorry for my delay here. In general it looks good. I would just drop the video files and extra agent traces.
There was a problem hiding this comment.
I would drop this and other vider artefacts. It will be inconvenient to vendor and not aid the agent meaningfully.
There was a problem hiding this comment.
No problem, I'll drop these videos and extra trace
There was a problem hiding this comment.
We should just just have one good trace. No need to differentiate opencode and codex.
There was a problem hiding this comment.
Thanks for the suggestion — agreed.
We simplified the reviewer package to keep a single high-quality trace only (OpenCode), and removed Codex-specific trace artifacts.
|
Just one suggestion. Do we want to ensure the MP4s are still included through external links (perhaos you could host them on the Hugging Face Hub)? |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Add ROCm Triton kernels skill for MI355X/R9700