Add an Affine Loop Perfection Optimization Pass #264

ShangkunLi · 2026-02-06T06:23:40Z

This pr enable an affine loop perfection pass. The logic is:

Detect all the loop bands in the func::FuncOp (a loop band is a sequence of nested loops w/o sibling loops);
For each loop band, extract the prologue and epilogue code from the loop;
Separate code in prologu/epilogue to pure computation code (arithmetic operations, affine.load, memref.load) and side-effecting code (affine.store, memref.store);
For pure computation code, move them into the innermost loop directly;
For side-effecting code, wrap it in a condition region and move to the innermost loop.

For prologue side-effecting code, the condition is loop index == lower bound
For epilogue side-effecting code, the condition is loop index == upper bound - step

For example:
Pure computation code:
Before transformation

func.func @example(%mem: memref<8x8xi32>) {
  affine.for %i = 0 to 8 {
    %temp = arith.muli %i, %c8 : index  // ← Prologue, pure computation
    affine.for %j = 0 to 8 {
      %val = affine.load %mem[%i, %j]
      %sum = arith.addi %val, %temp : i32
      affine.store %sum, %mem[%i, %j]
    }
    // Epilogue (empty in this case)
  }
}

After transformation

func.func @example(%mem: memref<8x8xi32>) {
  affine.for %i = 0 to 8 {
    affine.for %j = 0 to 8 {
      %temp = arith.muli %i, %c8 : index
      
      %val = affine.load %mem[%i, %j]
      %sum = arith.addi %val, %temp : i32
      affine.store %sum, %mem[%i, %j]
    }
  }
}

Side-Effecting Code:
Before transformation

affine.for %i = 0 to 8 {
  affine.store %c0, %flag[%i]  // ← Side-effecting prologue
  affine.for %j = 0 to 8 {
    %val = affine.load %mem[%i, %j]
    affine.store %val, %mem[%i, %j]
  }
}

After transformation

affine.for %i = 0 to 8 {
  affine.for %j = 0 to 8 {
    // ✅ Side-effecting operation: with condition execution
    %c0_idx = arith.constant 0 : index
    %is_first = arith.cmpi eq, %j, %c0_idx : index
    scf.if %is_first {
      affine.store %c0, %flag[%i]
    }
    
    %val = affine.load %mem[%i, %j]
    affine.store %val, %mem[%i, %j]
  }
}

tancheng · 2026-02-06T06:35:59Z

What is the benefit to perform this loop perfection? To enable counter?

All looks can be perfected theoretically?

Why store has side-effect? how to define side-effect?

ShangkunLi · 2026-02-06T06:59:32Z

What is the benefit to perform this loop perfection? To enable counter?

The benefit is for creating a counter chain. If we do not perform this loop perfection optimization, all the imperfect nested parts will be wrapped in a hyperblock and transformed by the neura logic. This may create long recurrence cycles and severely damage the performance.

All looks can be perfected theoretically?

Why store has side-effect? how to define side-effect?

Not all loops can be perfectized. For now, I reject the loop with the operation that produces memref type, and func::CallOp.

Side-effect operations mean that the operations may change the program state. Any store operation is a side-effect operation. Because it changes the data stores in the memory.

tancheng · 2026-02-06T17:47:53Z

Can we add all the taskflow-related passes into the compiler ASAP, instead of only enabling them in opt?

ShangkunLi · 2026-02-07T02:11:25Z

Can we add all the taskflow-related passes into the compiler ASAP, instead of only enabling them in opt?

Sure, filed an issue #265 about that.

ShangkunLi added 2 commits February 6, 2026 14:07

prototype affine loop perfection pass

35524b9

enable affine loop perfection optimization

c45ce49

unstage submodule changes

6192215

ShangkunLi force-pushed the loop-perfection branch from 00cd54c to 6192215 Compare February 6, 2026 06:38

ShangkunLi added 2 commits February 6, 2026 15:17

sync cgrabench

f9ff3d1

update cgrabench

fac94ae

ShangkunLi requested a review from guosran February 6, 2026 07:32

tancheng approved these changes Feb 6, 2026

View reviewed changes

ShangkunLi mentioned this pull request Feb 7, 2026

[P1] Integrating Taskflow-related Passes into the Compiler #265

Open

ShangkunLi merged commit a388c70 into coredac:main Feb 7, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an Affine Loop Perfection Optimization Pass #264

Add an Affine Loop Perfection Optimization Pass #264

Uh oh!

ShangkunLi commented Feb 6, 2026

Uh oh!

tancheng commented Feb 6, 2026

Uh oh!

ShangkunLi commented Feb 6, 2026 •

edited

Loading

Uh oh!

tancheng commented Feb 6, 2026

Uh oh!

ShangkunLi commented Feb 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add an Affine Loop Perfection Optimization Pass #264

Add an Affine Loop Perfection Optimization Pass #264

Uh oh!

Conversation

ShangkunLi commented Feb 6, 2026

Uh oh!

tancheng commented Feb 6, 2026

Uh oh!

ShangkunLi commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tancheng commented Feb 6, 2026

Uh oh!

ShangkunLi commented Feb 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ShangkunLi commented Feb 6, 2026 •

edited

Loading