Skip to content

enh(hotspot_analyzer): add --kernel filter for CSV metadata matching#657

Draft
Arist12 wants to merge 1 commit into
ROCm:mainfrom
Arist12:enh/hotspot-kernel-filter
Draft

enh(hotspot_analyzer): add --kernel filter for CSV metadata matching#657
Arist12 wants to merge 1 commit into
ROCm:mainfrom
Arist12:enh/hotspot-kernel-filter

Conversation

@Arist12
Copy link
Copy Markdown
Contributor

@Arist12 Arist12 commented Jun 4, 2026

Problem

hotspot_analyzer.py reads authoritative VGPR/SGPR/LDS/occupancy data from
the *_kernel_trace.csv file written by rocprofv3 --kernel-trace. To
select the correct row it tries to match Kernel_Name against the dispatch
directory basename.

This heuristic works for timestamped output directories
(20240101_120000_pa_decode_kernel) but fails completely for the
ui_output_agent_<N>_dispatch_<id> layout produced by rocprofv3's ATT
decode step. In that layout the directory basename carries only an agent
number and a dispatch counter — no kernel name — so every kernel name
comparison returns false and the metadata lookup silently returns {}.

The result is that the "Register Pressure & Occupancy" section uses ISA
estimates instead of the real CSV values for all ui_output_agent_* traces,
and the warning message gave no hint about how to fix it.

Solution

Add --kernel SUBSTR (optional, default ""):

  • When provided, uses a direct substring match on Kernel_Name instead of
    the dir-name heuristic.
  • If the *_kernel_trace.csv has a Dispatch_Id column and the
    directory name encodes dispatch_<id>, the row must also match on dispatch
    id. This prevents false matches when a PyTorch reference kernel shares the
    same name prefix as the target kernel and runs in the same profiling session.
  • Falls back to kernel-name-only substring matching when the CSV has no
    Dispatch_Id column.

The legacy heuristic (dir basename vs Kernel_Name bidirectional substring) is
unchanged and still used when --kernel is not given, so existing
timestamped-dir workflows are unaffected.

The "not matched" warning now mentions --kernel so users can discover the
fix without reading source.

Before / after

# Before — metadata not loaded for ui_output_agent_* dirs
python hotspot_analyzer.py ui_output_agent_15249_dispatch_223 --topk 4 --mode src
# (kernel_trace CSV not matched — accum/LDS/SGPR estimated from ISA only)

# After — CSV metadata loaded correctly
python hotspot_analyzer.py ui_output_agent_15249_dispatch_223 \
    --topk 4 --mode src \
    --kernel pa_mqa_logits_fp4_kernel_0
# Prints real VGPR/SGPR/LDS/occupancy from out_kernel_trace.csv

Testing

Five unit tests covering:

  1. Legacy timestamp heuristic still works (no regression).
  2. ui_output_agent_* dir without --kernel returns {} (expected).
  3. --kernel + Dispatch_Id column selects the correct CSV row.
  4. --kernel without Dispatch_Id column falls back to name-only match.
  5. argparse wires --kernel through to read_kernel_metadata.

All five pass.

The existing CSV row-selection heuristic matches by comparing the dispatch
directory basename against Kernel_Name in the kernel trace CSV.  This works
for rocprofv3's timestamped output (e.g. 20240101_120000_pa_decode_kernel),
but fails completely for the ui_output_agent_<N>_dispatch_<id> layout
produced by rocprofv3's ATT decode step — the basename carries no kernel
name, only agent and dispatch numbers.

When metadata lookup fails the analyzer falls back to ISA-estimated register
counts and prints a warning, silently under-reporting VGPR, SGPR, LDS, and
occupancy for every ui_output_agent_* trace.

Fix by adding a --kernel SUBSTR option that enables an explicit row-selection
path:
  1. Substrings-matches Kernel_Name against the supplied filter.
  2. If the CSV has a Dispatch_Id column and the directory name encodes
     dispatch_<id>, also requires the row's Dispatch_Id to match — avoiding
     false matches when a PyTorch reference kernel shares the same name prefix.
  3. Falls back gracefully to kernel-name-only matching when Dispatch_Id is
     absent from the CSV.

The legacy heuristic is unchanged and still used when --kernel is not given,
so existing timestamped-dir workflows are unaffected.

Update the "not matched" warning to tell users about --kernel so the fix is
discoverable without reading source.

Example:
    python hotspot_analyzer.py ui_output_agent_15249_dispatch_223 \
        --topk 8 --mode src --detail \
        --kernel pa_mqa_logits_fp4_kernel_0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant