Skip to content

[FEA] remove unnecessary keys selection and fuse selection to erase#404

Closed
ShaobinChen-AH wants to merge 2 commits into
NVIDIA:mainfrom
ShaobinChen-AH:fix-issue-357
Closed

[FEA] remove unnecessary keys selection and fuse selection to erase#404
ShaobinChen-AH wants to merge 2 commits into
NVIDIA:mainfrom
ShaobinChen-AH:fix-issue-357

Conversation

@ShaobinChen-AH

@ShaobinChen-AH ShaobinChen-AH commented May 22, 2026

Copy link
Copy Markdown
Contributor

Description

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Closes #357

Replace torch.where + integer-index chaining in _prefetch_cache_path admission path with direct boolean indexing and boolean scatter. Add optional mask parameter to Counter.erase() chain to fuse key selection into the erase kernel, eliminating separate
pre-selection tensor allocation."

The kernel-level fuse optimization (fusing selection into erase and scores update paths) will be implemented in a follow-up PR.

@greptile-apps

greptile-apps Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR refactors the admission-gating logic in _prefetch_cache_path by replacing torch.where(bool_mask)[0]-based integer indexing with direct boolean indexing, consolidating two separate operations into simpler, more idiomatic PyTorch expressions.

  • new_keys_sub = miss_keys[new_in_miss] replaces the torch.where + explicit integer-index chain, and keys_to_insert_mask[new_in_miss] = admit_mask replaces the two-step gather-then-scatter pattern.
  • Non-admitted position computation is simplified from miss_compact_idx[new_miss_indices[non_admit]] to miss_compact_idx[new_in_miss & ~keys_to_insert_mask], removing the intermediate integer-index tensor entirely.
  • The admission_counter.erase call is unchanged and still pre-filters with new_keys_sub[admit_mask], so the diff does not introduce the "fuse selection into erase kernel" change described in the PR body.

Confidence Score: 5/5

Safe to merge — the three substitutions are all semantically equivalent to what they replace and no new logic is introduced.

All changed expressions produce identical results: boolean indexing tensor[bool_mask] is equivalent to tensor[torch.where(bool_mask)[0]] in PyTorch, keys_to_insert_mask[new_in_miss] = admit_mask produces the same scatter as the old integer-index True assignment (the pre-condition keys_to_insert_mask[new_in_miss] is always False at that point), and new_in_miss & ~keys_to_insert_mask selects the same non-admitted positions as the old new_miss_indices[non_admit] chain. The erase call is unchanged and still pre-filters with new_keys_sub[admit_mask].

No files require special attention; the single changed file is a self-contained simplification.

Important Files Changed

Filename Overview
corelib/dynamicemb/dynamicemb/batched_dynamicemb_function.py Refactors _prefetch_cache_path admission logic to use direct boolean indexing; all three substitutions are semantically equivalent to the removed integer-index chain.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["miss_keys / miss_tids (size h_num_miss)"] --> B{"new_in_miss.any() AND admit_strategy?"}
    B -- "No admit_strategy" --> C["keys_to_insert_mask = all True"]
    B -- "Yes" --> D["new_keys_sub = miss_keys[new_in_miss] — boolean index"]
    D --> E["admission_counter.add → get freq"]
    E --> F["admit_mask = admit_strategy.admit(new_keys_sub, freq)"]
    F --> G{"admit_mask.any()?"}
    G -- "Yes" --> H["erase(new_keys_sub[admit_mask], new_tids_sub[admit_mask])"]
    H --> I["keys_to_insert_mask[new_in_miss] = admit_mask (boolean scatter)"]
    G -- "No" --> J["keys_to_insert_mask unchanged for new_in_miss positions"]
    I --> K["non_admit_miss = new_in_miss AND NOT keys_to_insert_mask"]
    J --> K
    K --> L{"non_admit_miss.any()?"}
    L -- "Yes" --> M["non_admitted_positions = miss_compact_idx[non_admit_miss]"]
    L -- "No" --> N["non_admitted_positions = None"]
    I --> O["Insert admitted + storage-found keys into cache"]
    M --> P["Return slot_indices, update_slot_indices, non_admitted_positions"]
    N --> P
    O --> P
Loading

Reviews (2): Last reviewed commit: "fix" | Re-trigger Greptile

Comment thread corelib/dynamicemb/dynamicemb/scored_hashtable.py Outdated
Comment thread corelib/dynamicemb/dynamicemb/batched_dynamicemb_function.py
@shijieliu shijieliu requested a review from jiashuy May 26, 2026 01:24

@jiashuy jiashuy left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, and create a PR to sovle the same issue

@shijieliu

Copy link
Copy Markdown
Collaborator

close this one since it's already addressed in #409

@shijieliu shijieliu closed this Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA] remove unnecessary keys selection and fuse selection to keys and scores

3 participants