Introduce XPU scope profiler extending existing XPU profiler plugin #1174

moksiuc · 2025-11-13T09:04:28Z

Summary:

As XPU became a PyTorch built-in device, the profiler support is indispensable part of functionality completeness. In this PR, the XPU scope profiler is introduced by extending existing XPU profiler plugin. The XPU scope profiler is built on the foundation of intel PTI toolkit (https://github.com/intel/pti-gpu), and underlying SYCL runtime. It allows to gather XPU hardware metrics. The LIBKINETO_NOXPUPTI option is used to enable or disable the whole XPU profiler plugin during kineto build stage.

Changes:

Added new ActivityType : XPU_SCOPE_PROFILER, enabling the new scope profiler
Added new class XpuptiScopeProfilerConfig derived from AbstractConfig for configuration of the new scope profiler
Enhanced ChromeTraceLogger::handleActivity method so it outputs XPU hardware metrics from the new scope profiler in Perfetto counters display mode ("C")
Added gtest

moksiuc · 2025-11-13T09:28:50Z

@EikanWang, @gujinghui

libkineto/src/plugin/xpupti/FindSYCLToolkit.cmake

gujinghui · 2025-11-14T06:34:11Z

@moksiuc It's great that we are going to update our PTI integration code, and introduce new profiler path.
Could you help address below questions?

The alternative of ScopeProfiler is the RangeProfiler for CUDA? Looks like the RangeProfler is not enabled in PyTorch by default so far. Do you know why?
This PR is too huge to review. Can we split it to several PRs? For example, one PR for code refactor or cleanup per kineto or PTI changes, one or two PRs for ScopeProfiler, one PR for ChromeTraceLogger enhancement, and add test cases for each PRs.
BTW, CUDA provides CUDA_DRIVER activity to trace the driver actions. We should provide L0 actions as the counterpart, right? I remember, PTI should be able to do that. Do we have plan to cover it?

kineto/libkineto/src/ActivityType.cpp

Line 29 in 1e30d37

{"cuda_driver", ActivityType::CUDA_DRIVER},

moksiuc · 2025-11-14T13:53:03Z

The alternative of ScopeProfiler is the RangeProfiler for CUDA? Looks like the RangeProfler is not enabled in PyTorch by default so far. Do you know why?
It is enabled by providing experimental_config=_ExperimentalConfig(...). I don't know why it is this way but we are enabling our profiler the same way. One of the reasons may be that Range/Scope profiler requires parameters like HW metrics names that are passed through _ExperimentalConfig.

moksiuc · 2025-11-14T13:53:59Z

BTW, CUDA provides CUDA_DRIVER activity to trace the driver actions. We should provide L0 actions as the counterpart, right? I remember, PTI should be able to do that. Do we have plan to cover it?

kineto/libkineto/src/ActivityType.cpp

Line 29 in 1e30d37

{"cuda_driver", ActivityType::CUDA_DRIVER},

For sure not in this PR. I'll add this to our list of tasks.

moksiuc · 2025-11-17T10:35:03Z

This PR is too huge to review. Can we split it to several PRs? For example, one PR for code refactor or cleanup per kineto or PTI changes, one or two PRs for ScopeProfiler, one PR for ChromeTraceLogger enhancement, and add test cases for each PRs.

Extracted clean up and adding config for scope profiler to separate PR's.
This one should be much smaller afterwards.
Currently I don't see further areas of extracting separate PR's as what would remain is full scope profiler implementation with tests and we'd like not to introduce half of the implementation that is not working functionally.

- removed rangeEnabled - fix test to align to this removal - erase used kernelActivity from map

- place of config initialization - removal of passing unused C compiler flag into test cmake file

gujinghui

This PR is split to #1177, #1180, and more.

gujinghui · 2025-12-02T01:51:47Z

@moksiuc Let's close this PR.

moksiuc · 2025-12-03T07:48:58Z

@gujinghui this is the core of the scope profiler. When 2 smaller parts are merged this one would have only core profiler left.

Otherwise old profiler requires setting ZET_ENABLE_METRICS what is incorrect requirement.

moksiuc · 2025-12-19T11:27:38Z

@EikanWang, @gujinghui
Please review.
2 extracted parts are already merged.
Now this change contains only code directly related to scope profiler.

moksiuc · 2025-12-19T11:28:31Z

Tests are run on pytorch part:
pytorch/pytorch#165766

gujinghui · 2025-12-22T02:41:13Z

libkineto/src/plugin/xpupti/XpuptiActivityApi.cpp

+        scopeProfilerEnabled = true;
+#else
+        throw std::runtime_error(
+            "Intel® oneAPI version required to use scope profiler is at least 2025.3.1");


Can we give the PTI version here, instead of oneAPI package version?

I had previously PTI version but is it not clearer for the user to have one api version ?
Has the user possibility to install any PTI version he wants ?

gujinghui · 2025-12-22T02:46:28Z

libkineto/test/xpupti/compute/CMakeLists.txt

@@ -0,0 +1,55 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.


@chuanqi129 If we want to append these cases in XPU CI scope, how should we do?

Hi @gujinghui , we don't have xpu ci test for kineto repo, do you mean put those test cases into stock pytorch? If so, we may evaluate whether can put them into https://github.com/pytorch/pytorch/tree/main/test/cpp/profiler

yes. @moksiuc please sync with @chuanqi129 to make it. Thanks.

gujinghui

LGTM. I assume you already verified it on local real machine, right? @moksiuc

moksiuc · 2025-12-22T10:29:17Z

LGTM. I assume you already verified it on local real machine, right? @moksiuc

Yes, I did.

gujinghui · 2025-12-24T01:59:51Z

@sraikund16 could you help review this PR? Thanks.

scope profiler squashed

52c714a

meta-cla bot added the cla signed label Nov 13, 2025

moksiuc changed the title ~~scope profiler squashed~~ Introduce XPU scope profiler extending existing XPU profiler plugin Nov 13, 2025

Type fix

0cbe479

gujinghui reviewed Nov 14, 2025

View reviewed changes

libkineto/src/plugin/xpupti/FindSYCLToolkit.cmake Outdated Show resolved Hide resolved

This was referenced Nov 14, 2025

Code cleanup of XPU profiler for incoming scope profiler #1177

Closed

Add config for incoming XPU scope profiler #1180

Closed

moksiuc added 2 commits November 17, 2025 13:20

Merge branch 'refs/heads/main' into moksiuci_6674_scope_profiler

7c98f84

Fixes

8438227

- removed rangeEnabled - fix test to align to this removal - erase used kernelActivity from map

gujinghui approved these changes Nov 18, 2025

View reviewed changes

moksiuc added 3 commits November 18, 2025 15:29

Code review fixes

fd2a6e1

- place of config initialization - removal of passing unused C compiler flag into test cmake file

Fix clang build on PT level

88082d5

Fix clang build on PT level 2nd approach

04f305c

gujinghui suggested changes Nov 19, 2025

View reviewed changes

moksiuc added 5 commits November 19, 2025 12:41

lintrunner

dbc709f

Improve json for presentation in perfetto

ad4b1dd

Merge branch 'refs/heads/main' into moksiuci_6674_scope_profiler

e050a30

Fix gtest after json update

37eafea

Put correct oneapi version in error msg

a266b04

moksiuc marked this pull request as ready for review November 24, 2025 10:01

Fix correct PTI and OneApi version

4a70122

sraikund16 and others added 2 commits December 17, 2025 13:19

lint

535b19e

Merge branch 'refs/heads/main' into moksiuci_6674_scope_profiler

b639a8c

moksiuc and others added 6 commits December 17, 2025 13:21

Fix merge error

e85fd26

Merge branch 'refs/heads/main' into moksiuci_6674_scope_profiler

115ae6c

Move device gathering to enableScopeProfiler

02b8684

Otherwise old profiler requires setting ZET_ENABLE_METRICS what is incorrect requirement.

lint

371c860

Merge branch 'refs/heads/main' into moksiuci_6674_scope_profiler

9e3f9fd

Resolve exception from desctuctor

0d46ae9

moksiuc requested a review from gujinghui December 19, 2025 11:28

gujinghui reviewed Dec 22, 2025

View reviewed changes

gujinghui approved these changes Dec 22, 2025

View reviewed changes

moksiuc added 4 commits December 22, 2025 13:58

Merge branch 'refs/heads/main' into moksiuci_6674_scope_profiler

906085b

Fix Win build: error C2039: 'back_inserter': is not a member of 'std'

898ce26

Update min PTI version error msg

a014058

Remove unnecesary includes

3fe5339

gujinghui approved these changes Dec 24, 2025

View reviewed changes

		@@ -0,0 +1,55 @@
		# Copyright (c) Meta Platforms, Inc. and affiliates.

Introduce XPU scope profiler extending existing XPU profiler plugin #1174

Are you sure you want to change the base?

Introduce XPU scope profiler extending existing XPU profiler plugin #1174

Conversation

moksiuc commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

moksiuc commented Nov 13, 2025

Uh oh!

Uh oh!

gujinghui commented Nov 14, 2025

Uh oh!

moksiuc commented Nov 14, 2025

Uh oh!

moksiuc commented Nov 14, 2025

Uh oh!

moksiuc commented Nov 17, 2025

Uh oh!

gujinghui left a comment

Choose a reason for hiding this comment

Uh oh!

gujinghui commented Dec 2, 2025

Uh oh!

moksiuc commented Dec 3, 2025

Uh oh!

moksiuc commented Dec 19, 2025

Uh oh!

moksiuc commented Dec 19, 2025

Uh oh!

gujinghui Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

moksiuc Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

gujinghui Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

chuanqi129 Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

gujinghui Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

gujinghui left a comment

Choose a reason for hiding this comment

Uh oh!

moksiuc commented Dec 22, 2025

Uh oh!

gujinghui commented Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

moksiuc commented Nov 13, 2025 •

edited

Loading