wait_for_kv() waits for K & V but not Q — what guarantees Q is ready?

## Issue Description

In **`attention_partial::launcher::wait_for_kv`** the code checks the *Bar* counters for the previous **`RMS_QKV_MatVecRopeAppend`** **K** and **V** blocks:

```cpp
// K
while (Bar[{layer_idx,
            OPCODE_RMS_QKV_MatVecRopeAppend - 1,
            num_attention_heads + kv_head_idx}] < 4) { … }

// V
while (Bar[{layer_idx,
            OPCODE_RMS_QKV_MatVecRopeAppend - 1,
            num_attention_heads + num_kv_heads + kv_head_idx}] < 4) { … }
```

There is no analogous polling for the corresponding Q block(s).

Because Q, K and V are generated by the same upstream opcode, I’d expect all three to be needed. What ensures that Q is already available (or otherwise not required) when PartialAttention begins?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wait_for_kv() waits for K & V but not Q — what guarantees Q is ready? #4

Issue Description

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

wait_for_kv() waits for K & V but not Q — what guarantees Q is ready? #4

Description

Issue Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions