[FEA] remove unnecessary keys selection and fuse selection to erase by ShaobinChen-AH · Pull Request #404 · NVIDIA/recsys-examples

ShaobinChen-AH · 2026-05-22T08:14:11Z

Description

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

Closes #357

Replace torch.where + integer-index chaining in _prefetch_cache_path admission path with direct boolean indexing and boolean scatter. Add optional mask parameter to Counter.erase() chain to fuse key selection into the erase kernel, eliminating separate
pre-selection tensor allocation."

The kernel-level fuse optimization (fusing selection into erase and scores update paths) will be implemented in a follow-up PR.

greptile-apps · 2026-05-22T08:18:48Z

Greptile Summary

This PR refactors the admission-gating logic in _prefetch_cache_path by replacing torch.where(bool_mask)[0]-based integer indexing with direct boolean indexing, consolidating two separate operations into simpler, more idiomatic PyTorch expressions.

new_keys_sub = miss_keys[new_in_miss] replaces the torch.where + explicit integer-index chain, and keys_to_insert_mask[new_in_miss] = admit_mask replaces the two-step gather-then-scatter pattern.
Non-admitted position computation is simplified from miss_compact_idx[new_miss_indices[non_admit]] to miss_compact_idx[new_in_miss & ~keys_to_insert_mask], removing the intermediate integer-index tensor entirely.
The admission_counter.erase call is unchanged and still pre-filters with new_keys_sub[admit_mask], so the diff does not introduce the "fuse selection into erase kernel" change described in the PR body.

Confidence Score: 5/5

Safe to merge — the three substitutions are all semantically equivalent to what they replace and no new logic is introduced.

All changed expressions produce identical results: boolean indexing tensor[bool_mask] is equivalent to tensor[torch.where(bool_mask)[0]] in PyTorch, keys_to_insert_mask[new_in_miss] = admit_mask produces the same scatter as the old integer-index True assignment (the pre-condition keys_to_insert_mask[new_in_miss] is always False at that point), and new_in_miss & ~keys_to_insert_mask selects the same non-admitted positions as the old new_miss_indices[non_admit] chain. The erase call is unchanged and still pre-filters with new_keys_sub[admit_mask].

No files require special attention; the single changed file is a self-contained simplification.

Important Files Changed

Filename	Overview
corelib/dynamicemb/dynamicemb/batched_dynamicemb_function.py	Refactors _prefetch_cache_path admission logic to use direct boolean indexing; all three substitutions are semantically equivalent to the removed integer-index chain.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["miss_keys / miss_tids (size h_num_miss)"] --> B{"new_in_miss.any() AND admit_strategy?"}
    B -- "No admit_strategy" --> C["keys_to_insert_mask = all True"]
    B -- "Yes" --> D["new_keys_sub = miss_keys[new_in_miss] — boolean index"]
    D --> E["admission_counter.add → get freq"]
    E --> F["admit_mask = admit_strategy.admit(new_keys_sub, freq)"]
    F --> G{"admit_mask.any()?"}
    G -- "Yes" --> H["erase(new_keys_sub[admit_mask], new_tids_sub[admit_mask])"]
    H --> I["keys_to_insert_mask[new_in_miss] = admit_mask (boolean scatter)"]
    G -- "No" --> J["keys_to_insert_mask unchanged for new_in_miss positions"]
    I --> K["non_admit_miss = new_in_miss AND NOT keys_to_insert_mask"]
    J --> K
    K --> L{"non_admit_miss.any()?"}
    L -- "Yes" --> M["non_admitted_positions = miss_compact_idx[non_admit_miss]"]
    L -- "No" --> N["non_admitted_positions = None"]
    I --> O["Insert admitted + storage-found keys into cache"]
    M --> P["Return slot_indices, update_slot_indices, non_admitted_positions"]
    N --> P
    O --> P

_{Reviews (2): Last reviewed commit: "fix" | Re-trigger Greptile}

jiashuy

LGTM, and create a PR to sovle the same issue

shijieliu · 2026-06-01T06:11:57Z

close this one since it's already addressed in #409

[FEA] remove unnecessary keys selection and fuse selection to erase

1c4fee5

greptile-apps Bot reviewed May 22, 2026

View reviewed changes

Comment thread corelib/dynamicemb/dynamicemb/scored_hashtable.py Outdated

Comment thread corelib/dynamicemb/dynamicemb/batched_dynamicemb_function.py

fix

1012a86

shijieliu requested a review from jiashuy May 26, 2026 01:24

jiashuy requested changes May 29, 2026

View reviewed changes

shijieliu closed this Jun 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] remove unnecessary keys selection and fuse selection to erase#404

[FEA] remove unnecessary keys selection and fuse selection to erase#404
ShaobinChen-AH wants to merge 2 commits into
NVIDIA:mainfrom
ShaobinChen-AH:fix-issue-357

ShaobinChen-AH commented May 22, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

jiashuy left a comment

Uh oh!

shijieliu commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ShaobinChen-AH commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

greptile-apps Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

jiashuy left a comment

Choose a reason for hiding this comment

Uh oh!

shijieliu commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ShaobinChen-AH commented May 22, 2026 •

edited

Loading

greptile-apps Bot commented May 22, 2026 •

edited

Loading