Skip to content

Conversation

@naoyam
Copy link
Collaborator

@naoyam naoyam commented Jan 13, 2026

I used to use selective options to enable part of the new indexer such as NVFUSER_ENABLE=id_model(producer_index), but I rarely use the selective options anymore.

Before globally enabling the new indexer, this PR just simplifies the option configuration. Previously, it has to be NVFUSER_ENABLE=id_model(all). With this PR, it's just NVFUSER_ENABLE=id_model.

@naoyam
Copy link
Collaborator Author

naoyam commented Jan 13, 2026

!test --diff

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 13, 2026

Greptile Overview

Greptile Summary

This PR significantly simplifies the IdModel configuration system by consolidating multiple granular options into a single unified approach. The primary goal is to make the IdModel easier to enable globally before it becomes the default indexing path.

Key Simplifications

Option Configuration

  • Environment variable syntax simplified from NVFUSER_ENABLE=id_model(all) to just NVFUSER_ENABLE=id_model
  • Removed 7 boolean flags (build_id_model_, build_tensor_indexer_, consumer_index_, producer_index_, inline_predicate_, unswitch_predicate_, loop_)
  • Replaced with single tensor_indexer_enabled_ flag in IdModelOptions
  • Removed complex ensureConsistency() logic that managed flag dependencies
  • Removed DisableOption::IdModel - IdModel can no longer be disabled

Code Architecture

  • Eliminated IdModelEnableOption enum and all related parsing functions from id_model/utils.h
  • All usage sites now check single isTensorIndexerEnabled() method instead of individual flags
  • Updated 60+ test files to use simplified option syntax

Behavioral Changes

Always-Built Components (important for understanding resource usage):

  • IdModel is now always built during lowering (previously conditional)
  • TensorIndexer is now always built (previously conditional)
  • NonDivisiblePredicateInfo is now always built (previously conditional)

These are constructed regardless of whether TensorIndexer will be used, which increases baseline memory/compute overhead but simplifies the code paths.

Automatic TensorIndexer Enabling (IMPORTANT - functional change):
The logic for when TensorIndexer is automatically enabled was modified:

  • Removed: Automatic enabling for MmaOp operations
  • Removed: Automatic enabling for CpAsyncBulkTensorTile (only CpAsyncBulk remains)
  • Removed: Automatic enabling for tensors with non-linear transformations (the hasRootToLoopLinearTransformations check)

This means MMA operations and certain complex tensor transformations will use the legacy indexer by default unless IdModel is explicitly enabled or the fusion contains other triggering operations (ArgsortOp, PadOp, ScanOp, ScatterOp, SliceOp, TopKOp, CpAsyncBulk, or reshape with expanded broadcast).

Test Updates

All tests updated to remove argument passing to EnableOption::IdModel (changed from .set(EnableOption::IdModel, {"all"}) to .set(EnableOption::IdModel)). Some tests have updated variable name expectations due to IdModel changes (e.g., i98→i114 in test_indexing.cpp).

Confidence Score: 4/5

  • This PR is generally safe but contains functional changes beyond the stated scope that warrant discussion
  • The code simplification itself is well-executed and consistent across all 63 files. However, the score is 4 rather than 5 because: (1) The PR removes automatic TensorIndexer enabling for MmaOp, CpAsyncBulkTensorTile, and non-linear transformations without documenting this in the PR description, which could be a significant behavioral change. (2) IdModel and TensorIndexer are now always built, increasing baseline overhead. (3) These changes go beyond "simplifying the option configuration" as stated in the PR description. The test suite passes with these changes, suggesting intentional behavior, but explicit confirmation would be valuable.
  • csrc/device_lower/lower2device.cpp - Contains the removed logic for automatic TensorIndexer enabling (MmaOp, CpAsyncBulkTensorTile, non-linear transformations). Verify this behavioral change is intentional and doesn't break MMA or complex tensor operations.

Important Files Changed

File Analysis

Filename Score Overview
csrc/device_lower/id_model_options.h 5/5 Simplified from 7 boolean flags to single tensor_indexer_enabled_ flag, removed complex ensureConsistency logic
csrc/options.h 5/5 Removed DisableOption::IdModel enum value as IdModel can no longer be disabled
csrc/options.cpp 5/5 Removed "id_model" from disable options map, no longer supporting NVFUSER_DISABLE=id_model
csrc/id_model/utils.h 5/5 Removed IdModelEnableOption enum and all related parsing functions (getIdModelEnabledOptions, isIdModelOptionEnabled)
csrc/device_lower/lower2device.cpp 3/5 Major changes: IdModel/TensorIndexer always built now; removed automatic TensorIndexer enabling for MmaOp, CpAsyncBulkTensorTile, and non-linear transformations
csrc/index_compute.cpp 5/5 Replaced specific flag checks (consumerIndex, producerIndex) with unified isTensorIndexerEnabled() check
csrc/predicate_compute.cpp 5/5 Replaced specific flag checks (inlinePredicate, unswitchPredicate) with unified isTensorIndexerEnabled() check

Sequence Diagram

sequenceDiagram
    participant User
    participant ENV as Environment Variable
    participant Options as EnableOptions
    participant IdModelOpts as IdModelOptions
    participant Lower as GpuLower::analysis
    participant IdModel
    participant TensorIndexer
    participant Indexer as Index/Predicate Compute

    User->>ENV: Set NVFUSER_ENABLE=id_model
    Note over ENV: OLD: id_model(all)<br/>NEW: id_model
    
    ENV->>Options: Parse environment variable
    Options->>IdModelOpts: Create with isOptionEnabled(IdModel)
    Note over IdModelOpts: OLD: 7 boolean flags<br/>NEW: 1 tensor_indexer_enabled_
    
    IdModelOpts->>Lower: getIdModelOptions(fusion)
    Note over Lower: Check for specific ops:<br/>- CpAsyncBulk<br/>- ArgsortOp, PadOp, etc.<br/>- ReshapeOp with expanded broadcast
    
    alt TensorIndexer needed for ops
        Lower->>IdModelOpts: setTensorIndexer(true)
    end
    
    alt TensorIndexer not supported
        Lower->>IdModelOpts: setTensorIndexer(false)
    end
    
    Lower->>IdModel: Always build IdModel
    Note over IdModel: OLD: Conditional on buildIdModel()<br/>NEW: Always built
    
    Lower->>TensorIndexer: Always build TensorIndexer
    Note over TensorIndexer: OLD: Conditional on buildTensorIndexer()<br/>NEW: Always built
    
    alt isTensorIndexerEnabled()
        Lower->>IdModel: allocateLoopIndexVariables()
    end
    
    Indexer->>IdModelOpts: isTensorIndexerEnabled()?
    alt Enabled
        Indexer->>TensorIndexer: Use TensorIndexer for indexing/predicates
    else Disabled
        Indexer->>Indexer: Use legacy indexer
    end
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@naoyam
Copy link
Collaborator Author

naoyam commented Jan 13, 2026

!test

@github-actions
Copy link

github-actions bot commented Jan 13, 2026

Review updated until commit be3515d

Description

  • Simplify IdModel options from selective flags to single enable flag

  • Replace complex option parsing with straightforward tensor indexer toggle

  • Update all test files to use simplified option format

  • Remove unused option parsing logic and disable option

Changes walkthrough

Relevant files
Enhancement
5 files
id_model_options.h
Simplify IdModelOptions class structure                                   
+11/-120
lower2device.cpp
Streamline IdModel option logic                                                   
+17/-46 
utils.h
Remove complex option parsing functions                                   
+0/-68   
options.cpp
Remove id_model disable option                                                     
+0/-1     
options.h
Remove IdModel disable option enum                                             
+0/-1     
Tests
1 files
utils.cpp
Update test setup to use simplified option                             
+1/-1     
Additional files
57 files
utils.cpp +1/-1     
id_model.cpp +1/-1     
index_compute.cpp +5/-5     
predicate_compute.cpp +2/-2     
test_abstract_tensor.cpp +1/-1     
test_allocation_domain.cpp +1/-1     
test_allocation_order_inference.cpp +1/-1     
test_argsort.cpp +1/-1     
test_bfs.cpp +1/-1     
test_circular_buffering.cpp +8/-8     
test_circular_buffering_ping_pong.cpp +1/-1     
test_cluster.cpp +1/-1     
test_combined_inner_outer_reduction.cpp +1/-1     
test_compute_with.cpp +1/-1     
test_contiguity_id_model.cpp +1/-1     
test_gather.cpp +1/-1     
test_gpu1.cpp +4/-4     
test_gpu2.cpp +1/-1     
test_gpu3.cpp +2/-2     
test_greedy.cpp +1/-1     
test_index_put.cpp +1/-1     
test_index_select.cpp +1/-1     
test_indexing.cpp +34/-33 
test_indexing_advanced.cpp +2/-2     
test_inlining.cpp +1/-1     
test_layout_op.cpp +1/-1     
test_loop_domain_scheduling.cpp +1/-1     
test_matmul.cpp +2/-2     
test_matmul_scheduler.cpp +4/-4     
test_memory.cpp +3/-3     
test_mma.cpp +4/-4     
test_moe.cpp +1/-1     
test_move_pad.cpp +1/-1     
test_move_repeat_forward.cpp +1/-1     
test_move_split_cat.cpp +1/-1     
test_outer_reduction.cpp +1/-1     
test_persistent_buffer.cpp +2/-2     
test_pointwise.cpp +2/-2     
test_predicate_elimination.cpp +1/-1     
test_reduction.cpp +1/-1     
test_reduction_pointwise.cpp +1/-1     
test_remove_bcast_squeeze.cpp +1/-1     
test_replay.cpp +1/-1     
test_reshape.cpp +2/-2     
test_resize.cpp +40/-40 
test_rng.cpp +1/-1     
test_rope.cpp +1/-1     
test_scatter.cpp +2/-2     
test_select.cpp +1/-1     
test_serial_gridreduce.cpp +1/-1     
test_stream.cpp +1/-1     
test_topk.cpp +1/-1     
test_transpose.cpp +2/-2     
test_unary.cpp +1/-1     
test_vectorization.cpp +3/-3     
test_welford.cpp +1/-1     
utils.h +2/-2     

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 PR contains tests
⚡ Recommended focus areas for review

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@naoyam naoyam requested a review from jjsjann123 January 13, 2026 05:18
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +558 to +561
tensor_indexer_ = std::make_unique<TensorIndexer>(info().idModel());

non_divisible_predicate_info_ =
std::make_unique<NonDivisiblePredicateInfo>(fusion_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behavioral Change: TensorIndexer and NonDivisiblePredicateInfo are now always constructed, regardless of whether TensorIndexer is enabled for use.

Previously, these were only constructed when buildTensorIndexer() was true. This means:

  • Additional memory allocation occurs in all cases now
  • Construction overhead happens even when not used
  • Any code checking GpuLower::current()->isTensorIndexerEnabled() (which checks if the pointer is non-null) will now always return true

The decision to use TensorIndexer is now solely controlled by idModelOptions().isTensorIndexerEnabled() (the boolean flag), but the object itself always exists. This separation is important to understand for maintenance.

Copy link
Collaborator

@jjsjann123 jjsjann123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks straightforward to me.

@naoyam
Copy link
Collaborator Author

naoyam commented Jan 14, 2026

!build

@naoyam naoyam merged commit 9881db4 into main Jan 14, 2026
15 of 16 checks passed
@naoyam naoyam deleted the id_model_option_cleanup branch January 14, 2026 00:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants