[FEA] remove unnecessary keys selection and fuse selection to erase#404
[FEA] remove unnecessary keys selection and fuse selection to erase#404ShaobinChen-AH wants to merge 2 commits into
Conversation
Greptile SummaryThis PR refactors the admission-gating logic in
Confidence Score: 5/5Safe to merge — the three substitutions are all semantically equivalent to what they replace and no new logic is introduced. All changed expressions produce identical results: boolean indexing tensor[bool_mask] is equivalent to tensor[torch.where(bool_mask)[0]] in PyTorch, keys_to_insert_mask[new_in_miss] = admit_mask produces the same scatter as the old integer-index True assignment (the pre-condition keys_to_insert_mask[new_in_miss] is always False at that point), and new_in_miss & ~keys_to_insert_mask selects the same non-admitted positions as the old new_miss_indices[non_admit] chain. The erase call is unchanged and still pre-filters with new_keys_sub[admit_mask]. No files require special attention; the single changed file is a self-contained simplification. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["miss_keys / miss_tids (size h_num_miss)"] --> B{"new_in_miss.any() AND admit_strategy?"}
B -- "No admit_strategy" --> C["keys_to_insert_mask = all True"]
B -- "Yes" --> D["new_keys_sub = miss_keys[new_in_miss] — boolean index"]
D --> E["admission_counter.add → get freq"]
E --> F["admit_mask = admit_strategy.admit(new_keys_sub, freq)"]
F --> G{"admit_mask.any()?"}
G -- "Yes" --> H["erase(new_keys_sub[admit_mask], new_tids_sub[admit_mask])"]
H --> I["keys_to_insert_mask[new_in_miss] = admit_mask (boolean scatter)"]
G -- "No" --> J["keys_to_insert_mask unchanged for new_in_miss positions"]
I --> K["non_admit_miss = new_in_miss AND NOT keys_to_insert_mask"]
J --> K
K --> L{"non_admit_miss.any()?"}
L -- "Yes" --> M["non_admitted_positions = miss_compact_idx[non_admit_miss]"]
L -- "No" --> N["non_admitted_positions = None"]
I --> O["Insert admitted + storage-found keys into cache"]
M --> P["Return slot_indices, update_slot_indices, non_admitted_positions"]
N --> P
O --> P
Reviews (2): Last reviewed commit: "fix" | Re-trigger Greptile |
jiashuy
left a comment
There was a problem hiding this comment.
LGTM, and create a PR to sovle the same issue
|
close this one since it's already addressed in #409 |
Description
Checklist
Closes #357
Replace torch.where + integer-index chaining in _prefetch_cache_path admission path with direct boolean indexing and boolean scatter. Add optional mask parameter to Counter.erase() chain to fuse key selection into the erase kernel, eliminating separate
pre-selection tensor allocation."
The kernel-level fuse optimization (fusing selection into erase and scores update paths) will be implemented in a follow-up PR.