fix: remove redundant CP_ASYNC_WAIT_GROUP #401
+780
−755
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🚀 Summary
This PR removes a redundant
CP_ASYNC_WAIT_GROUP(0);instruction inkernels/sgemm/sgemm_async.cuand applies the required code formatting to pass the CI checks.📝 Details
1. Dead Code Removal
In the
sgemm_t_8x8_sliced_k16_f32x4_bcf_dbuf_kernel, the memory loading is currently implemented using synchronous instructions (explicit register buffering usingLDG/STSor direct assignment), rather than Ampere's asynchronous copy (cp.async).Since no asynchronous copy groups are ever committed via
CP_ASYNC_COMMIT_GROUP, the instructionCP_ASYNC_WAIT_GROUP(0);at the end of the loop is functionally useless (dead code) and potentially misleading. I have removed it to improve code clarity.2. Formatting Changes (Important)
You will notice a significant diff in this file. This is intentional.
I ran the project's pre-commit hooks before committing. The repository configures
clang-formatwith--style=fileviapre-commit, but there is no.clang-formatfile in the root directory. As a result,clang-formatfalls back to the default LLVM style, reformatting the entire file.I have included these formatting changes to ensure this PR passes the project's automated CI/pre-commit checks.
🤝 Note
Since this is my first contribution, I wasn't entirely sure if committing the full file reformatting (triggered by the project's pre-commit hooks) is the standard practice here. If this large diff is not desired, please let me know, and I will be happy to adjust the PR accordingly!