Skip to content

wait_for_kv() waits for K & V but not Q — what guarantees Q is ready? #4

@Beibei-Zhou

Description

@Beibei-Zhou

Issue Description

In attention_partial::launcher::wait_for_kv the code checks the Bar counters for the previous RMS_QKV_MatVecRopeAppend K and V blocks:

// K
while (Bar[{layer_idx,
            OPCODE_RMS_QKV_MatVecRopeAppend - 1,
            num_attention_heads + kv_head_idx}] < 4) { … }

// V
while (Bar[{layer_idx,
            OPCODE_RMS_QKV_MatVecRopeAppend - 1,
            num_attention_heads + num_kv_heads + kv_head_idx}] < 4) { … }

There is no analogous polling for the corresponding Q block(s).

Because Q, K and V are generated by the same upstream opcode, I’d expect all three to be needed. What ensures that Q is already available (or otherwise not required) when PartialAttention begins?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions