-
Notifications
You must be signed in to change notification settings - Fork 77
check hardware constraints when setting threads per sm #5814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
!test |
Auto-merge Status✅ Internal CI is finished Description
|
| Relevant files | |||
|---|---|---|---|
| Bug fix |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 No relevant tests |
| 🔒 No security concerns identified |
| ⚡ Recommended focus areas for review |
Missing Tests
|
Greptile OverviewGreptile SummaryAdded
Confidence Score: 5/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant Caller as getHeuristics()
participant Lambda as getGdimy()
participant Utils as getThreadsPerSMGivenRegPerThread()
participant DevProp as CUDA Device Properties
participant BlockCalc as getBlocksPerSM()
Caller->>Lambda: Call with inner_vect, threads_per_block, inner_batch
Lambda->>Lambda: Calculate reg_per_thread from register usage
Lambda->>Utils: getThreadsPerSMGivenRegPerThread(reg_per_thread)
Utils->>DevProp: Query register allocation properties
Utils-->>Lambda: Return threads_per_sm (based on registers)
Lambda->>DevProp: Query maxThreadsPerMultiProcessor
DevProp-->>Lambda: Return hardware limit
Lambda->>Lambda: threads_per_sm = min(register_based, hardware_limit)
Note over Lambda: NEW: Prevents exceeding hardware constraints
Lambda->>BlockCalc: getBlocksPerSM(threads_per_sm, threads_per_block, warpSize)
BlockCalc-->>Lambda: Return blocks_per_sm
Lambda->>Lambda: Calculate gdimy = blocks_per_sm * multiprocessor_count
Lambda->>Lambda: Apply outer_iter_min constraint
Lambda-->>Caller: Return final gdimy value
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No files reviewed, no comments
To avoid error on hardware allows a small number of threads per sm