Skip to content

Conversation

@mwiktor-intel
Copy link

fixes #1900. The implementation utilizes two mkl::lapack libraries,
geqrf and geqrf for recovering pure Q matrix. Since torch and lapack use different storage formats, a hard transposition (memory layout not only stride ) was necessary. The iteraion over batch utilizes internal memory layout of processed data.

@CuiYifeng
Copy link
Contributor

CuiYifeng commented Nov 26, 2025

  • QR is MKL Op rather than SYCL Op. Please move kernel code to xpu/mkl/.
  • Existed xpu/mkl/BatchLinearAlgebra.cpp is a choice for kernel code (refer to stock Pytorch). Correspondingly, op level code can be added in xpu/BatchLinearAlgebra.cpp.
  • Please check lint error.
  • Adding new test case is good. Please check if there are related cases in test/xpu/skip_list_common.py. If so, please reactivate them.

@Silv3S
Copy link
Contributor

Silv3S commented Nov 26, 2025

There were no tests on the skip lists, as QR was silently falling back to CPU. Maybe after removing it, some tests will start to fail now

Copy link
Contributor

@CuiYifeng CuiYifeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check comments and fix related failed cases.

Comment on lines +69 to +83
TORCH_IMPL_FUNC(linalg_qr_xpu_out)(const Tensor& A,
std::string_view mode,
const Tensor & Q,
const Tensor & R) {
#if defined(USE_ONEMKL_XPU)
xpu::linalg_qr_kernel(A, mode, Q, R);
#else
auto A_cpu = A.to(at::kCPU);
auto Q_cpu = at::empty_like(Q, at::kCPU);
auto R_cpu = at::empty_like(R, at::kCPU);
at::cpu::linalg_qr_out(Q_cpu, R_cpu, A_cpu, mode);
Q.copy_(Q_cpu);
R.copy_(R_cpu);
#endif // USE_ONEMKL_XPU
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion is to register geqrf_kerenl_xpu/orgqr_kernel_xpu to geqrf_stub/orgqr_stub, which allows us to reuse op level code in stock Pytorch and reuse these two kernels in future.

Comment on lines +9446 to +9456
- func: linalg_qr(Tensor A, str mode='reduced') -> (Tensor Q, Tensor R)
python_module: linalg
variants: function
structured_delegate: linalg_qr.out

- func: linalg_qr.out(Tensor A, str mode='reduced', *, Tensor(a!) Q, Tensor(b!) R) -> (Tensor(a!) Q, Tensor(b!) R)
python_module: linalg
structured: True
dispatch:
XPU: linalg_qr_xpu_out

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

implement torch.linalg.qr xpu backend

3 participants