Skip to content

Conversation

@ShangkunLi
Copy link
Collaborator

This pr enable an affine loop perfection pass. The logic is:

  1. Detect all the loop bands in the func::FuncOp (a loop band is a sequence of nested loops w/o sibling loops);
  2. For each loop band, extract the prologue and epilogue code from the loop;
  3. Separate code in prologu/epilogue to pure computation code (arithmetic operations, affine.load, memref.load) and side-effecting code (affine.store, memref.store);
  4. For pure computation code, move them into the innermost loop directly;
  5. For side-effecting code, wrap it in a condition region and move to the innermost loop.
  • For prologue side-effecting code, the condition is loop index == lower bound
  • For epilogue side-effecting code, the condition is loop index == upper bound - step

For example:
Pure computation code:
Before transformation

func.func @example(%mem: memref<8x8xi32>) {
  affine.for %i = 0 to 8 {
    %temp = arith.muli %i, %c8 : index  // ← Prologue, pure computation
    affine.for %j = 0 to 8 {
      %val = affine.load %mem[%i, %j]
      %sum = arith.addi %val, %temp : i32
      affine.store %sum, %mem[%i, %j]
    }
    // Epilogue (empty in this case)
  }
}

After transformation

func.func @example(%mem: memref<8x8xi32>) {
  affine.for %i = 0 to 8 {
    affine.for %j = 0 to 8 {
      %temp = arith.muli %i, %c8 : index
      
      %val = affine.load %mem[%i, %j]
      %sum = arith.addi %val, %temp : i32
      affine.store %sum, %mem[%i, %j]
    }
  }
}

Side-Effecting Code:
Before transformation

affine.for %i = 0 to 8 {
  affine.store %c0, %flag[%i]  // ← Side-effecting prologue
  affine.for %j = 0 to 8 {
    %val = affine.load %mem[%i, %j]
    affine.store %val, %mem[%i, %j]
  }
}

After transformation

affine.for %i = 0 to 8 {
  affine.for %j = 0 to 8 {
    // ✅ Side-effecting operation: with condition execution
    %c0_idx = arith.constant 0 : index
    %is_first = arith.cmpi eq, %j, %c0_idx : index
    scf.if %is_first {
      affine.store %c0, %flag[%i]
    }
    
    %val = affine.load %mem[%i, %j]
    affine.store %val, %mem[%i, %j]
  }
}

@tancheng
Copy link
Contributor

tancheng commented Feb 6, 2026

What is the benefit to perform this loop perfection? To enable counter?

All looks can be perfected theoretically?

Why store has side-effect? how to define side-effect?

@ShangkunLi
Copy link
Collaborator Author

ShangkunLi commented Feb 6, 2026

What is the benefit to perform this loop perfection? To enable counter?

The benefit is for creating a counter chain. If we do not perform this loop perfection optimization, all the imperfect nested parts will be wrapped in a hyperblock and transformed by the neura logic. This may create long recurrence cycles and severely damage the performance.

All looks can be perfected theoretically?

Why store has side-effect? how to define side-effect?

Not all loops can be perfectized. For now, I reject the loop with the operation that produces memref type, and func::CallOp.

Side-effect operations mean that the operations may change the program state. Any store operation is a side-effect operation. Because it changes the data stores in the memory.

@ShangkunLi ShangkunLi requested a review from guosran February 6, 2026 07:32
@tancheng
Copy link
Contributor

tancheng commented Feb 6, 2026

Can we add all the taskflow-related passes into the compiler ASAP, instead of only enabling them in opt?

@ShangkunLi
Copy link
Collaborator Author

Can we add all the taskflow-related passes into the compiler ASAP, instead of only enabling them in opt?

Sure, filed an issue #265 about that.

@ShangkunLi ShangkunLi merged commit a388c70 into coredac:main Feb 7, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants