Introduce Affine Controller Design#266
Conversation
|
How is the inner-loops start/end got updated? |
8da1fd0 to
4a7125c
Compare
Distributed counter units are updated through the cmds (including |
|
Shadow registers are used in the loop delivery mode in DCU to store and deliver the outer loop indexes. VectorCGRA/fu/single/LoopCounterRTL.py Line 134 in 61d7e56 They are updated by the affine controller through the VectorCGRA/fu/single/LoopCounterRTL.py Line 150 in 61d7e56
Only DCUs in loop count mode will send a VectorCGRA/fu/single/LoopCounterRTL.py Line 114 in 61d7e56 After receiving this complete signal, the affine controller will trigger outer loop counter increments and send |
lib/cmd_type.py
Outdated
| CMD_GLOBAL_REDUCE_COUNT: "(GLOBAL_REDUCE_COUNT)", | ||
| CMD_GLOBAL_REDUCE_ADD: "(GLOBAL_REDUCE_ADD)", | ||
| CMD_GLOBAL_REDUCE_MUL: "(GLOBAL_REDUCE_MUL)", | ||
| CMD_GLOBAL_REDUCE_MUL: "(GLOBAL_REDUCE_MUL)", |
| DataType(mk_parent_payload(0, True), 0), 0, CtrlType(0), 0), | ||
| ] | ||
|
|
||
| # ===== Configure CCU[1]: j = 0..2, parent = CCU[0] ===== |
There was a problem hiding this comment.
How is CCU[1] be triggered by CCU[0]?
There was a problem hiding this comment.
Here we are testing a three-layer nested loop.
for i ... // CCU0
for j ... // CCU1
for k ... // DCUAs you can see in the CCU config part:
For CCU0, it is used for loop i. It is configured as:
CgraPayloadType(CMD_AC_CONFIG_CHILD_COUNT, DataType(1, 0), 0, CtrlType(0), 0),
# CCU[0] target: i-delivery DCU at ctrl_addr=1 (shadow_only!)
CgraPayloadType(CMD_AC_CONFIG_TARGET,
*mk_target_config(1, 0, shadow_only=True), CtrlType(0), 0),
CgraPayloadType(CMD_AC_CONFIG_PARENT,
DataType(mk_parent_payload(0, True), 0), 0, CtrlType(0), 0),This means CCU0 has one child counter and is configured as a parent counter.
For CCU1, it is used for loop j. It is configured as:
CgraPayloadType(CMD_AC_CONFIG_CHILD_COUNT, DataType(1, 0), 0, CtrlType(0), 1),
# CCU[1] target 0: k-DCU at ctrl_addr=0 (leaf, needs reset + shadow)
CgraPayloadType(CMD_AC_CONFIG_TARGET,
*mk_target_config(0, 0, shadow_only=False), CtrlType(0), 1),
# CCU[1] target 1: j-delivery DCU at ctrl_addr=2 (shadow_only!)
CgraPayloadType(CMD_AC_CONFIG_TARGET,
*mk_target_config(2, 0, shadow_only=True), CtrlType(0), 1),
CgraPayloadType(CMD_AC_CONFIG_PARENT,
DataType(mk_parent_payload(0, False), 0), 0, CtrlType(0), 1),This means this counter has a parent counter, and its id is 0 (i.e., CCU0).
| # CCU[0] child_count=2, targets at ctrl_addr=0 and ctrl_addr=1. | ||
| #------------------------------------------------------------------------- | ||
|
|
||
| def test_sibling_barrier(): |
There was a problem hiding this comment.
- What does
barriermean? child_countmatters? It is used for what?- ctrl_addr is the control signal/instruction index in ctrl memory? how would it be related to child_count?
There was a problem hiding this comment.
- This is to describe the case where 1 root counter + 2 sibling child counters. The outer loop only updates its index when both inner counters complete.
child_countmeans the required count of complete signals from child counters. The outer loop will increase by 1 when all the required complete signals are received.- Since each CCU has a DCU on the tile array, and each leaf counter also has a DCU on the tile array, the
ctrl_addis used:
1. For CCU on the DCU, the DCU is configured as theloop deliverymode, thectrl_addrhelps us to distinguish whether the DCU needs to be updated when its corresponding updates.
2. For the leaf counter on the DCU, the DCU is configured as theloop countmode, thectrl_addrhelps us to distinguish which outer loop the received complete signal can be used to trigger.
controller/AffineControllerRTL.py
Outdated
| ========================================================================== | ||
| Affine Controller (AC) for managing outer loop counters in CGRA. | ||
|
|
||
| Each AC contains configurable number of Configurable Counter Units (CCUs). |
There was a problem hiding this comment.
configurable number -> parameterizable number?
| AffineControllerRTL.py | ||
| ========================================================================== | ||
| Affine Controller (AC) for managing outer loop counters in CGRA. | ||
|
|
There was a problem hiding this comment.
Can you also explain where this FU located in terms of arch design? It is an FU similar to Adder? But it is inside the controller folder, so seems near the controller instead?
It can also consumes command (e.g., CMD_AC_CONFIG_LOWER and COMPLETE). CMD_AC_CONFIG_LOWER is from whom?
There was a problem hiding this comment.
The AC is under the control of the controller.
The CMD_AC_CONFIG_LOWER is from the controller.
The CMD_AC_CHILD_COMPLETE is from the affine controller that belongs to another CGRA. This command is introduced so that we can chain two affine controllers from two/multiple CGRAs into a bigger affine controller.
There was a problem hiding this comment.
Plz create a folder inside https://github.com/tancheng/VectorCGRA/tree/master/doc/figures to include your design diagrams, and put link into comment of this .py file. Also attach figures into this PR's description.
Moreover, what do you think of renaming AC to LC? i.e., Loop controller? IIRC, we already have a LoopCounter in our FUs. So LC would control LoopCounter, am I right?
| cmp_fn = lambda a, b: (a.cmd == b.cmd) and \ | ||
| (a.data.payload == b.data.payload) and \ | ||
| (a.ctrl_addr == b.ctrl_addr) |
| CMD_AC_CONFIG_LOWER = 32 # Configures CCU lower_bound. | ||
| CMD_AC_CONFIG_UPPER = 33 # Configures CCU upper_bound. | ||
| CMD_AC_CONFIG_STEP = 34 # Configures CCU step. | ||
| CMD_AC_CONFIG_CHILD_COUNT = 35 # Configures child_complete_count. | ||
| CMD_AC_CONFIG_TARGET = 36 # Configures target (tile_id, ctrl_addr, is_remote, cgra_id). | ||
| CMD_AC_CONFIG_PARENT = 37 # Configures parent_ccu_id, is_root, is_relay. | ||
| CMD_AC_LAUNCH = 38 # Launches AC (all CCUs enter RUNNING). | ||
|
|
||
| # Affine Controller Inter-CGRA Sync Commands. | ||
| CMD_AC_SYNC_VALUE = 39 # Parent AC → Child AC: sync current value. | ||
| CMD_AC_CHILD_COMPLETE = 40 # Child AC → Parent AC: child complete. | ||
| CMD_AC_CHILD_RESET = 41 # Parent AC → Child AC: reset child. | ||
|
|
||
| # Affine Controller Status. | ||
| CMD_AC_ALL_COMPLETE = 42 # AC → Controller: all loops complete. |
There was a problem hiding this comment.
Briefly explain who is the parent/sender and who is the receiver/child. tile? another ac? and where the cmd would come from or produce.

Add Affine Controller (AC) for Outer Loop Management
Summary
This PR introduces the Affine Controller (AC), a programmable hardware module that manages outer loop counters in the CGRA. While the existing LoopCounterRTL (DCU) handles innermost loop counting at the tile level, the AC coordinates multi-level loop nesting and cross-CGRA loop synchronization.
Architecture
The AC contains an array of Configurable Counter Units (CCUs), each representing one level of a loop nest. CCUs form a DAG topology where:
Each CCU tracks a loop variable (
lower_bound,upper_bound, step,current_value) and manages a set of targets — tile-array DCUs it must notify when advancing iterations.State Machine:
IDLE → RUNNING → DISPATCHING → RUNNING / COMPLETERUNNING: Waiting for child completion eventsDISPATCHING: Sending commands to targets (1 cycle per target)COMPLETE: Loop finished, parent notified internallyKey Design Decisions
CMD_RESET_LEAF_COUNTER(leaf-mode DCU) orCMD_UPDATE_COUNTER_SHADOW_VALUE(delivery-mode DCU). This avoids redundant messages and minimizes dispatch latency.received_complete_countin the same cycle — no external signaling needed.RUNNING, all child CCUs are automatically reset tolower_bound.current_value >= upper_bound, the CCU transitions directly toCOMPLETEwithout dispatching, preventing stale completion events from the previous iteration.RUNNINGstate.Cross-CGRA Support
CCU targets can be marked as remote (
is_remote=1). Dispatch commands for remote targets are sent viasend_to_remote(routed through the Controller's inter-CGRA NoC). Remote completion events arrive asCMD_AC_CHILD_COMPLETEonrecv_from_remote.Files Changed
NUM_CMDS28→40Test Cases
child_count=2). Verifies barrier synchronization.send_to_remote/recv_from_remotepaths.