Skip to content

Validate default_ptx_arch against NVRTC supported architectures#1276

Open
cluster2600 wants to merge 2 commits intoNVIDIA:mainfrom
cluster2600:fix/validate-default-ptx-arch
Open

Validate default_ptx_arch against NVRTC supported architectures#1276
cluster2600 wants to merge 2 commits intoNVIDIA:mainfrom
cluster2600:fix/validate-default-ptx-arch

Conversation

@cluster2600
Copy link

@cluster2600 cluster2600 commented Mar 8, 2026

Description

default_ptx_arch is hardcoded to 75 in both the device-present and
no-device initialisation paths without verifying that the value is actually
supported by the available NVRTC. The same applies to
config.ptx_target_arch when the user has set it explicitly.

This PR adds a small helper (_resolve_supported_ptx_arch) that clamps the
target architecture to the closest NVRTC-supported value — preferring the
lowest supported arch >= the target so that the resulting PTX remains as
broadly forward-compatible as possible. Both initialisation paths now call
this helper so that default_ptx_arch is always validated before it gets
used for compilation.

The practical impact today is low (NVRTC has supported arch 75 since
CUDA 10 and Warp requires CUDA 12+), but this hardens the code against
future NVRTC releases that may drop older architectures.

Closes #1218

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Test plan

  1. Ran python3 -m pytest warp/tests/test_context.py -v — all 5 tests pass
    (1 pre-existing + 4 new).
  2. New unit tests excercise the following scenarios for
    _resolve_supported_ptx_arch:
    • Exact match — target is in nvrtc_supported_archs, returned as-is.
    • Clamp up — target missing, lowest supported arch above target chosen.
    • Fallback to max — all supported archs are below target, highest
      supported arch returned.
    • Single element — only one supported arch availble.
  3. The behaviour of the device-present path is unchanged when all devices
    have NVRTC-supported architectures (the common case). The no-device
    path now validates config.ptx_target_arch / the 75 fallback against
    the supported set.

New feature / enhancement

# _resolve_supported_ptx_arch(target_arch, supported_archs)
# returns target_arch if supported, otherwise the closest alternative
from warp._src.context import _resolve_supported_ptx_arch

_resolve_supported_ptx_arch(75, {70, 80, 86})  # => 80
_resolve_supported_ptx_arch(75, {70, 75, 86})  # => 75
_resolve_supported_ptx_arch(90, {70, 75, 80})  # => 80

Summary by CodeRabbit

  • Bug Fixes

    • Improved PTX architecture selection so the app consistently chooses an NVRTC-supported PTX arch across environments (including offline or no-device scenarios). When a requested arch isn't supported the system now selects the nearest supported architecture and logs a fallback.
  • Tests

    • Added unit tests covering exact-match, upward-clamping, fallback-to-max, single-entry behavior, and empty-set error handling for the PTX arch selection logic.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 8, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link

coderabbitai bot commented Mar 8, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a module-level helper to select an NVRTC-supported PTX architecture and applies it to validate/compute PTX arch in both CUDA-present and no-device NVRTC code paths; adds unit tests covering match, clamp-up, fallback, single-element, and empty-set error cases.

Changes

Cohort / File(s) Summary
PTX Architecture Resolution
warp/_src/context.py
Adds _resolve_supported_ptx_arch(target_arch: int, supported_archs: set[int]) -> int and integrates it to validate/override selected PTX architectures in CUDA and no-device NVRTC code paths; logs a warning when falling back.
Unit Tests
warp/tests/test_context.py
Adds tests for _resolve_supported_ptx_arch: exact match, clamp-up (smallest >= target), fallback to max (target above all), single-element behavior, and empty-set raising ValueError.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding validation of default_ptx_arch against NVRTC supported architectures.
Linked Issues check ✅ Passed The PR implements the core requirement from #1218: adding _resolve_supported_ptx_arch helper that validates/clamps PTX target arch against NVRTC-supported values, with validation in both device-present and no-device initialization paths.
Out of Scope Changes check ✅ Passed All changes are directly related to validating PTX arch against NVRTC support: new helper function, validation in compilation path, tests covering the new functionality, and no extraneous modifications.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can approve the review once all CodeRabbit's comments are resolved.

Enable the reviews.request_changes_workflow setting to automatically approve the review once all CodeRabbit's comments are resolved.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@warp/_src/context.py`:
- Around line 5333-5341: The current clamp only updates runtime.default_ptx_arch
and can let an unsupported per-device PTX target slip through; update the final
PTX-target selection in Device.get_cuda_compile_arch() (which computes
min(self.arch, warp.config.ptx_target_arch or runtime.default_ptx_arch)) to
revalidate the chosen target against self.nvrtc_supported_archs using
_resolve_supported_ptx_arch (or equivalent) and ensure the returned target is
NVRTC-supported for that device; if no supported PTX exists for the device,
raise a clear, descriptive error indicating the device arch and the available
nvrtc_supported_archs so compilation will fail fast.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 09d275e4-b3d2-4717-a9d8-03d4da24e2db

📥 Commits

Reviewing files that changed from the base of the PR and between 8bfcaf8 and 6883172.

📒 Files selected for processing (2)
  • warp/_src/context.py
  • warp/tests/test_context.py

@greptile-apps
Copy link

greptile-apps bot commented Mar 8, 2026

Greptile Summary

This PR adds a _resolve_supported_ptx_arch helper that validates a desired PTX architecture against the set of architectures actually supported by the available NVRTC, falling back to the nearest supported value (preferring the lowest arch ≥ target for forward compatibility). Both the device-present and no-device Runtime initialisation paths now call this helper, and a small set of unit tests covers all key edge cases including the empty-set error path.

Key changes:

  • New _resolve_supported_ptx_arch(target_arch, supported_archs) function with ValueError guard for empty input
  • Device-present path (Runtime.__init__): validates default_ptx_arch after the min() selection in case no NVRTC-eligible device exists
  • No-device path: resolves config.ptx_target_arch / the 75 fallback against nvrtc_supported_archs before assigning default_ptx_arch
  • Device.get_cuda_output_arch(): validates the per-device output_arch against NVRTC when PTX output is selected
  • 5 new unit tests covering exact-match, clamp-up, fallback-to-max, single-element, and empty-set scenarios

Issue found:

  • The fallback warning in _resolve_supported_ptx_arch uses a raw print() call instead of warp.utils.warn() — this bypasses warp.config.quiet, and because the function is called from the per-kernel Device.get_cuda_output_arch() hot path, it will emit the same message repeatedly on every kernel compilation rather than just once.

Confidence Score: 4/5

  • PR is safe to merge with a minor warning-output fix recommended before landing.
  • The core logic is correct and well-tested. The only issue is a print() warning that bypasses warp.config.quiet and will repeat on every kernel compilation if PTX arch fallback is active — a one-line fix to use warp.utils.warn(..., once=True) would bring it in line with the rest of the codebase.
  • warp/_src/context.py line 3963 — warning emission should use warp.utils.warn with once=True.

Important Files Changed

Filename Overview
warp/_src/context.py Adds _resolve_supported_ptx_arch helper and integrates it into both device-present and no-device init paths; logic is correct but the warning uses print() rather than warp.utils.warn(), which bypasses warp.config.quiet and will repeat on every kernel compilation.
warp/tests/test_context.py Adds 5 focused unit tests for _resolve_supported_ptx_arch covering exact-match, clamp-up, fallback-to-max, single-element, and empty-set cases; all scenarios are well-covered.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["Runtime.__init__()"] --> B{cuda_device_count > 0?}
    B -- Yes --> C["default_ptx_arch = 75"]
    C --> D["min(d.arch for d in cuda_devices\nif d.arch >= default_ptx_arch\nand d.arch in nvrtc_supported_archs)"]
    D -- "ValueError (no eligible device)" --> E["retain default_ptx_arch = 75"]
    D -- "success" --> F["default_ptx_arch = min arch"]
    E --> G{nvrtc_supported_archs\nnon-empty AND\ndefault_ptx_arch not in it?}
    F --> G
    G -- Yes --> H["_resolve_supported_ptx_arch(default_ptx_arch, nvrtc_supported_archs)"]
    G -- No --> I["default_ptx_arch ready"]
    H --> I

    B -- No --> J{nvrtc_supported_archs\nnon-empty?}
    J -- Yes --> K["target = config.ptx_target_arch\nor 75"]
    K --> L["_resolve_supported_ptx_arch(target, nvrtc_supported_archs)"]
    L --> M["default_ptx_arch set for offline compilation"]
    J -- No --> N["default_ptx_arch = None"]

    subgraph "Per-kernel: Device.get_cuda_output_arch()"
        O["output_arch = min(self.arch, default_ptx_arch)"] --> P{output_arch not in\nnvrtc_supported_archs?}
        P -- Yes --> Q["_resolve_supported_ptx_arch(output_arch, nvrtc_supported_archs)"]
        P -- No --> R["use output_arch as-is"]
        Q --> R
    end

    subgraph "_resolve_supported_ptx_arch(target, supported)"
        S{supported empty?} -- Yes --> T["raise ValueError"]
        S -- No --> U{target in supported?}
        U -- Yes --> V["return target"]
        U -- No --> W["above = [a for a in supported if a >= target]"]
        W --> X{above non-empty?}
        X -- Yes --> Y["return above[0] (lowest >= target)"]
        X -- No --> Z["return max(supported) (fallback to highest)"]
    end
Loading

Last reviewed commit: 8bc0bed

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
warp/_src/context.py (1)

3865-3867: ⚠️ Potential issue | 🟠 Major

Reject PTX targets that exceed the device architecture.

This still allows output_arch to clamp upward past self.arch. For example, an sm_75 device with nvrtc_supported_archs={80, 86} now resolves to 80, which is not runnable on that device. This should fail fast instead of returning an invalid PTX target.

Suggested fix
-            if runtime.nvrtc_supported_archs and output_arch not in runtime.nvrtc_supported_archs:
-                output_arch = _resolve_supported_ptx_arch(output_arch, runtime.nvrtc_supported_archs)
+            if runtime.nvrtc_supported_archs:
+                runnable_archs = {arch for arch in runtime.nvrtc_supported_archs if arch <= self.arch}
+                if not runnable_archs:
+                    raise RuntimeError(
+                        f"No NVRTC-supported PTX target can run on device {self.alias} "
+                        f"(sm_{self.arch}); supported PTX targets are {sorted(runtime.nvrtc_supported_archs)}."
+                    )
+                if output_arch not in runnable_archs:
+                    output_arch = _resolve_supported_ptx_arch(output_arch, runnable_archs)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@warp/_src/context.py` around lines 3865 - 3867, The current logic may clamp
output_arch upward to a PTX target not supported by the physical device (e.g.,
device with self.arch == 75 gets resolved to 80 via
_resolve_supported_ptx_arch); instead, after resolving a supported NVRTC arch
via _resolve_supported_ptx_arch (when runtime.nvrtc_supported_archs is set),
verify that the resolved arch does not exceed the device architecture (compare
numeric SM values from output_arch/self.arch) and if it does, raise an explicit
error (e.g., ValueError or RuntimeError) rather than returning an invalid PTX
target; update the branch that references runtime.nvrtc_supported_archs,
output_arch, and _resolve_supported_ptx_arch to perform this check and raise on
mismatch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@warp/_src/context.py`:
- Around line 3946-3953: Update the docstring to use the repo's RST-style
double-backtick formatting for code and parameter references: replace italicized
occurrences of target_arch and supported_archs with ``target_arch`` and
``supported_archs`` (also change any other italicized terms like *NVRTC* or
inline code to ``NVRTC`` or appropriate ``...``), keeping the existing text and
semantics (the description about returning the closest supported arch and
raising ValueError remains unchanged).

---

Duplicate comments:
In `@warp/_src/context.py`:
- Around line 3865-3867: The current logic may clamp output_arch upward to a PTX
target not supported by the physical device (e.g., device with self.arch == 75
gets resolved to 80 via _resolve_supported_ptx_arch); instead, after resolving a
supported NVRTC arch via _resolve_supported_ptx_arch (when
runtime.nvrtc_supported_archs is set), verify that the resolved arch does not
exceed the device architecture (compare numeric SM values from
output_arch/self.arch) and if it does, raise an explicit error (e.g., ValueError
or RuntimeError) rather than returning an invalid PTX target; update the branch
that references runtime.nvrtc_supported_archs, output_arch, and
_resolve_supported_ptx_arch to perform this check and raise on mismatch.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: ac0ce7b2-a16c-41bd-9052-5ef39e32319c

📥 Commits

Reviewing files that changed from the base of the PR and between 6883172 and 4e7972c.

📒 Files selected for processing (2)
  • warp/_src/context.py
  • warp/tests/test_context.py

@cluster2600 cluster2600 force-pushed the fix/validate-default-ptx-arch branch from 3a19703 to fafa510 Compare March 9, 2026 15:11
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
warp/_src/context.py (2)

3865-3867: ⚠️ Potential issue | 🟠 Major

Don't clamp PTX targets past the device architecture.

With an sm_75 device and runtime.nvrtc_supported_archs={80, 86}, Line 3867 resolves output_arch to 80. That PTX target is newer than the GPU, so the later JIT/load can still fail. Revalidate against the subset of NVRTC-supported arches that are <= self.arch, and raise a clear error if that subset is empty instead of clamping upward.

Proposed fix
-            # Ensure the chosen PTX arch is actually supported by NVRTC
-            if runtime.nvrtc_supported_archs and output_arch not in runtime.nvrtc_supported_archs:
-                output_arch = _resolve_supported_ptx_arch(output_arch, runtime.nvrtc_supported_archs)
+            # Ensure the chosen PTX arch is actually supported by NVRTC and runnable on this device.
+            if runtime.nvrtc_supported_archs:
+                device_supported_archs = {arch for arch in runtime.nvrtc_supported_archs if arch <= self.arch}
+                if not device_supported_archs:
+                    raise RuntimeError(
+                        f"No NVRTC-supported PTX target can run on device {self.alias} (sm_{self.arch}); "
+                        f"available NVRTC targets: {sorted(runtime.nvrtc_supported_archs)}"
+                    )
+                if output_arch not in device_supported_archs:
+                    output_arch = _resolve_supported_ptx_arch(output_arch, device_supported_archs)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@warp/_src/context.py` around lines 3865 - 3867, The current clamping logic in
the block referencing runtime.nvrtc_supported_archs and output_arch calls
_resolve_supported_ptx_arch which can select a newer PTX (e.g., choosing 80 for
an sm_75 device); instead restrict the candidate NVRTC arches to those <=
self.arch (i.e., filter runtime.nvrtc_supported_archs by numeric arch <=
self.arch) and if that filtered set is empty raise a clear ValueError (or
RuntimeError) describing no compatible NVRTC-supported PTX for this device,
otherwise call _resolve_supported_ptx_arch with the filtered set to pick a
supported target instead of clamping upward.

3946-3953: ⚠️ Potential issue | 🟡 Minor

Use repo docstring formatting for parameter references.

This helper docstring still uses italics for target_arch and supported_archs; the repo convention here is double-backtick literals.

Suggested fix
-    """Return *target_arch* if NVRTC supports it, otherwise the closest supported arch.
+    """Return ``target_arch`` if NVRTC supports it, otherwise the closest supported arch.
 
     Preference is given to the lowest supported architecture that is >=
-    *target_arch* so that the resulting PTX is as broadly forward-compatible
+    ``target_arch`` so that the resulting PTX is as broadly forward-compatible
     as possible.  If no such architecture exists the highest supported value
     is returned instead.
 
-    *supported_archs* must be non-empty; a ``ValueError`` is raised otherwise.
+    ``supported_archs`` must be non-empty; a ``ValueError`` is raised otherwise.
     """

As per coding guidelines, "Use double backticks for code elements in docstrings (RST syntax)" and "Use double backticks for parameter cross-references in docstrings."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@warp/_src/context.py` around lines 3946 - 3953, Update the helper function
docstring that describes returning a supported NVRTC architecture so it uses RST
double-backtick literals instead of italics: replace instances of target_arch
and supported_archs formatted with single asterisks/italics with ``target_arch``
and ``supported_archs`` (and any other code-like names such as NVRTC) to follow
the repo docstring convention and parameter cross-reference style.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@warp/_src/context.py`:
- Around line 3865-3867: The current clamping logic in the block referencing
runtime.nvrtc_supported_archs and output_arch calls _resolve_supported_ptx_arch
which can select a newer PTX (e.g., choosing 80 for an sm_75 device); instead
restrict the candidate NVRTC arches to those <= self.arch (i.e., filter
runtime.nvrtc_supported_archs by numeric arch <= self.arch) and if that filtered
set is empty raise a clear ValueError (or RuntimeError) describing no compatible
NVRTC-supported PTX for this device, otherwise call _resolve_supported_ptx_arch
with the filtered set to pick a supported target instead of clamping upward.
- Around line 3946-3953: Update the helper function docstring that describes
returning a supported NVRTC architecture so it uses RST double-backtick literals
instead of italics: replace instances of target_arch and supported_archs
formatted with single asterisks/italics with ``target_arch`` and
``supported_archs`` (and any other code-like names such as NVRTC) to follow the
repo docstring convention and parameter cross-reference style.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 0f5abec7-7765-44c8-9e5a-59af9233085a

📥 Commits

Reviewing files that changed from the base of the PR and between 3a19703 and fafa510.

📒 Files selected for processing (2)
  • warp/_src/context.py
  • warp/tests/test_context.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • warp/tests/test_context.py

Copy link
Contributor

@shi-eric shi-eric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, @cluster2600!

Suggestion: Consider adding a warning inside _resolve_supported_ptx_arch when the arch gets adjusted, since the alternative (NVRTC compilation failure) is always worse:

resolved = above[0] if above else max(supported_archs)
print(
    f"Warning: PTX target arch sm_{target_arch} is not supported by NVRTC; using sm_{resolved} instead"
)
return resolved

Without this, users who hit the fallback path have no indication that their target arch was silently changed, which could be confusing if they're debugging arch-specific behavior.

Edit: Maybe the Warp warn() function is not appropriate to call here, edited with print

@cluster2600
Copy link
Author

Thanks for the review @shi-eric! Great suggestion — added the warning in 52d189b. Users will now see a clear message when the PTX target arch is adjusted, e.g.:

Warning: PTX target arch sm_75 is not supported by NVRTC; using sm_80 instead

Much better than a silent fallback.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
warp/_src/context.py (1)

3858-3867: ⚠️ Potential issue | 🟠 Major

Keep the resolved PTX target at or below the device SM.

This still has the same remaining edge case as the earlier review thread: _resolve_supported_ptx_arch() can raise output_arch above self.arch. For an sm_75 device with runtime.nvrtc_supported_archs == {80, 86}, this resolves to sm_80, which is not a valid PTX target for that device. Filter the candidate set to architectures <= self.arch here, and fail fast if that filtered set is empty.

🔧 Proposed fix
         if self.get_cuda_output_format() == "ptx":
             # use the default PTX arch if the device supports it
             if warp.config.ptx_target_arch is not None:
                 output_arch = min(self.arch, warp.config.ptx_target_arch)
             else:
                 output_arch = min(self.arch, runtime.default_ptx_arch)
 
-            # Ensure the chosen PTX arch is actually supported by NVRTC
-            if runtime.nvrtc_supported_archs and output_arch not in runtime.nvrtc_supported_archs:
-                output_arch = _resolve_supported_ptx_arch(output_arch, runtime.nvrtc_supported_archs)
+            # Ensure the chosen PTX arch is supported by NVRTC and can run on this device.
+            if runtime.nvrtc_supported_archs:
+                compatible_archs = {arch for arch in runtime.nvrtc_supported_archs if arch <= self.arch}
+                if not compatible_archs:
+                    raise RuntimeError(
+                        f"Device {self.alias} (sm_{self.arch}) has no compatible PTX target in "
+                        f"{sorted(runtime.nvrtc_supported_archs)}."
+                    )
+                if output_arch not in compatible_archs:
+                    output_arch = _resolve_supported_ptx_arch(output_arch, compatible_archs)
According to NVIDIA CUDA/PTX compatibility documentation, can PTX generated for `compute_80` run on an `sm_75` GPU, or must the PTX target be less than or equal to the device's compute capability?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@warp/_src/context.py` around lines 3858 - 3867, The current PTX selection can
return an arch higher than the device SM because _resolve_supported_ptx_arch()
is called on the full runtime.nvrtc_supported_archs; update the branch inside
get_cuda_output_format() == "ptx" (the block that sets output_arch using
warp.config.ptx_target_arch/runtime.default_ptx_arch) to first filter
runtime.nvrtc_supported_archs (and any candidate set) to only architectures <=
self.arch, then call _resolve_supported_ptx_arch() on that filtered set; if the
filtered candidate set is empty, raise/emit a clear error (fail fast) indicating
no supported PTX arch <= self.arch so we don't pick an incompatible higher PTX
target.
🧹 Nitpick comments (1)
warp/_src/context.py (1)

3961-3963: Use Warp's warning helper instead of print().

print() bypasses the repo's warning path and will fire on every repeated resolution. warp._src.utils.warn(..., once=True) keeps this consistent with the rest of warp/_src/context.py.

♻️ Proposed change
-    print(
-        f"Warning: PTX target arch sm_{target_arch} is not supported by NVRTC; using sm_{resolved} instead"
-    )
+    warp._src.utils.warn(
+        f"PTX target arch sm_{target_arch} is not supported by NVRTC; using sm_{resolved} instead",
+        once=True,
+    )

As per coding guidelines, "Use warp._src.utils.warn() instead of warnings.warn()—it routes warnings to stdout as some applications don't want Warp writing to stderr".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@warp/_src/context.py` around lines 3961 - 3963, Replace the direct print(...)
call in the PTX target resolution path with Warp's warning helper to avoid
bypassing the repo's warning system: locate the print that emits "Warning: PTX
target arch sm_{target_arch} is not supported..." in the resolution logic (use
symbols target_arch and resolved to find the code) and call warp._src.utils.warn
with the same message and once=True (so the message is routed via Warp's warning
path and shown only once).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@warp/_src/context.py`:
- Around line 3858-3867: The current PTX selection can return an arch higher
than the device SM because _resolve_supported_ptx_arch() is called on the full
runtime.nvrtc_supported_archs; update the branch inside get_cuda_output_format()
== "ptx" (the block that sets output_arch using
warp.config.ptx_target_arch/runtime.default_ptx_arch) to first filter
runtime.nvrtc_supported_archs (and any candidate set) to only architectures <=
self.arch, then call _resolve_supported_ptx_arch() on that filtered set; if the
filtered candidate set is empty, raise/emit a clear error (fail fast) indicating
no supported PTX arch <= self.arch so we don't pick an incompatible higher PTX
target.

---

Nitpick comments:
In `@warp/_src/context.py`:
- Around line 3961-3963: Replace the direct print(...) call in the PTX target
resolution path with Warp's warning helper to avoid bypassing the repo's warning
system: locate the print that emits "Warning: PTX target arch sm_{target_arch}
is not supported..." in the resolution logic (use symbols target_arch and
resolved to find the code) and call warp._src.utils.warn with the same message
and once=True (so the message is routed via Warp's warning path and shown only
once).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: c64e6c11-52ac-4d2b-8718-804abb3c654f

📥 Commits

Reviewing files that changed from the base of the PR and between fafa510 and 52d189b.

📒 Files selected for processing (1)
  • warp/_src/context.py

@shi-eric
Copy link
Contributor

@cluster2600 Okay, this is almost ready to merge. Please fix the pre-commit issue, squash your changes, and rebase with main. Once those are done I can merge. Thanks!

@cluster2600 cluster2600 force-pushed the fix/validate-default-ptx-arch branch from 52d189b to e55f12e Compare March 11, 2026 09:38
@cluster2600
Copy link
Author

@shi-eric Done — I've squashed everything into a single commit, rebased onto main, and fixed the ruff formatting issue (the print() call in _resolve_supported_ptx_arch was being split across multiple lines unnecessarily). Pre-commit should be happy now.

Cheers for the thorough reveiw, happy to address anything else if needed.

default_ptx_arch is hardcoded to 75 in both the device-present and
no-device code paths without checking whether the value is actually
supported by the available NVRTC.  If a future NVRTC release were to
drop support for arch 75 this would cause a cryptic compilation
failure at runtime.

Add a small helper _resolve_supported_ptx_arch() that clamps the
target architecture to the closest NVRTC-supported value and use it
in both initialisation paths so that default_ptx_arch (and
config.ptx_target_arch when set) are always validated.  A warning is
printed whenever the target is adjusted so users are aware of the
change.

Also validate per-device PTX targets in get_cuda_compile_arch() so a
device arch below the NVRTC-supported range is resolved correctly.

Adds unit tests exercising the exact-match, clamp-up, fallback-to-max,
single-element, and empty-set edge cases.

Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
@cluster2600 cluster2600 force-pushed the fix/validate-default-ptx-arch branch from e55f12e to 30c7c05 Compare March 11, 2026 09:50
@cluster2600 cluster2600 requested a review from shi-eric March 15, 2026 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Validate default_ptx_arch against NVRTC supported architectures

2 participants