check hardware constraints when setting threads per sm #5814

liqiangxl · 2026-01-13T17:28:49Z

To avoid error on hardware allows a small number of threads per sm

liqiangxl · 2026-01-13T17:29:08Z

!test

github-actions · 2026-01-13T17:29:33Z

Auto-merge Status

✅ Internal CI is finished
❌ No failed checks (nvfuser-ci/smoke_test_wheel_nightly_20)
✅ PR is mergeable
ℹ️ PR mergeable_state: unstable

Description

Add hardware constraint validation for threads per SM calculation
Use std::min to limit threads by register usage and maxThreadsPerMultiProcessor
Prevent potential errors on hardware with limited threads per SM
Add explanatory comment for the constraint check

Changes walkthrough

Relevant files

Bug fix

normalization_inner_outer_multi_wave.cpp `Add hardware constraint validation for threads per SM` csrc/scheduler/normalization_inner_outer_multi_wave.cpp Modified threads_per_sm calculation to include hardware constraint check Added std::min comparison between register-based calculation and maxThreadsPerMultiProcessor Added explanatory comment documenting the hardware constraint limitation Prevents errors on hardware with limited threads per SM	+4/-1

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review
Missing Tests The PR adds important hardware constraint checking but lacks corresponding tests to validate the behavior. Tests should verify that the threads per SM calculation correctly respects both register-based constraints and hardware limits, especially on hardware with low thread capacity. int64_t threads_per_sm = std::min( getThreadsPerSMGivenRegPerThread(reg_per_thread), static_cast<int64_t>(dev_prop->maxThreadsPerMultiProcessor)); Performance Impact While the hardware constraint check is necessary for correctness, it may impact performance on hardware with conservative thread limits. The PR should include performance data to quantify any potential regressions and ensure the change doesn't significantly degrade performance on supported hardware. int64_t threads_per_sm = std::min( getThreadsPerSMGivenRegPerThread(reg_per_thread), static_cast<int64_t>(dev_prop->maxThreadsPerMultiProcessor));

greptile-apps · 2026-01-13T17:31:26Z

Greptile Overview

Greptile Summary

Added std::min() constraint to limit threads_per_sm by maxThreadsPerMultiProcessor hardware property when calculating grid dimensions.

Previously, getThreadsPerSMGivenRegPerThread() could return a value exceeding hardware limits on devices with constrained thread capacity
The fix prevents runtime errors by capping threads per SM to the hardware maximum
This is a defensive fix that protects against edge cases on hardware with limited threading capabilities

Confidence Score: 5/5

This PR is safe to merge - it adds a defensive constraint that prevents hardware limit violations
The change is minimal, well-targeted, and adds a safety check without altering core logic. The std::min() constraint ensures calculated thread counts never exceed hardware capabilities, preventing potential runtime errors on constrained devices
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
csrc/scheduler/normalization_inner_outer_multi_wave.cpp	5/5	Added hardware constraint check to prevent exceeding `maxThreadsPerMultiProcessor` when calculating threads per SM

Sequence Diagram

sequenceDiagram
    participant Caller as getHeuristics()
    participant Lambda as getGdimy()
    participant Utils as getThreadsPerSMGivenRegPerThread()
    participant DevProp as CUDA Device Properties
    participant BlockCalc as getBlocksPerSM()
    
    Caller->>Lambda: Call with inner_vect, threads_per_block, inner_batch
    Lambda->>Lambda: Calculate reg_per_thread from register usage
    Lambda->>Utils: getThreadsPerSMGivenRegPerThread(reg_per_thread)
    Utils->>DevProp: Query register allocation properties
    Utils-->>Lambda: Return threads_per_sm (based on registers)
    Lambda->>DevProp: Query maxThreadsPerMultiProcessor
    DevProp-->>Lambda: Return hardware limit
    Lambda->>Lambda: threads_per_sm = min(register_based, hardware_limit)
    Note over Lambda: NEW: Prevents exceeding hardware constraints
    Lambda->>BlockCalc: getBlocksPerSM(threads_per_sm, threads_per_block, warpSize)
    BlockCalc-->>Lambda: Return blocks_per_sm
    Lambda->>Lambda: Calculate gdimy = blocks_per_sm * multiprocessor_count
    Lambda->>Lambda: Apply outer_iter_min constraint
    Lambda-->>Caller: Return final gdimy value

greptile-apps

_{No files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

liqiangxl added 2 commits January 13, 2026 09:25

check hardware constraints when setting threads per sm

dc675f9

check hardware constraints when setting threads per sm

20de0e1

liqiangxl added the enable-auto-merge Auto-merge a PR when: 1) PR mergeable 2) Internal CI complete 3) No failures label Jan 13, 2026

liqiangxl requested a review from rdspring1 January 13, 2026 17:29

greptile-apps bot reviewed Jan 13, 2026

View reviewed changes

rdspring1 approved these changes Jan 13, 2026

View reviewed changes

liqiangxl merged commit 4b31086 into main Jan 14, 2026
63 checks passed

liqiangxl deleted the llu/threads_per_sm branch January 14, 2026 01:46

github-actions bot removed the enable-auto-merge Auto-merge a PR when: 1) PR mergeable 2) Internal CI complete 3) No failures label Jan 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

check hardware constraints when setting threads per sm #5814

check hardware constraints when setting threads per sm #5814

Uh oh!

liqiangxl commented Jan 13, 2026

Uh oh!

liqiangxl commented Jan 13, 2026

Uh oh!

github-actions bot commented Jan 13, 2026 •

edited

Loading

Changes walkthrough

PR Reviewer Guide

Uh oh!

greptile-apps bot commented Jan 13, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

check hardware constraints when setting threads per sm #5814

check hardware constraints when setting threads per sm #5814

Uh oh!

Conversation

liqiangxl commented Jan 13, 2026

Uh oh!

liqiangxl commented Jan 13, 2026

Uh oh!

github-actions bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Auto-merge Status

Description

Changes walkthrough

PR Reviewer Guide

Uh oh!

greptile-apps bot commented Jan 13, 2026

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Jan 13, 2026 •

edited

Loading