-
Notifications
You must be signed in to change notification settings - Fork 65
Enable -fno-reciprocal-math to fix div accuracy. #2413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR addresses a division accuracy issue in SYCL kernels where dividing a number by itself could return non-1 results. The fix adds the -fno-reciprocal-math compiler flag to disable reciprocal math optimizations that can introduce numerical inaccuracies.
Key Changes:
- Added
-fno-reciprocal-mathtoSYCL_KERNEL_OPTIONSto prevent compiler optimizations that replace division operations with multiplication by reciprocals
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Currently evaluating the use of a more localized approach using |
| # gcc -shared host.o kernel.o device-code.o -o libxxx.so | ||
| set(SYCL_KERNEL_OPTIONS ${SYCL_KERNEL_OPTIONS} -fno-sycl-unnamed-lambda) | ||
| set(SYCL_KERNEL_OPTIONS ${SYCL_KERNEL_OPTIONS} -sycl-std=2020) | ||
| set(SYCL_KERNEL_OPTIONS ${SYCL_KERNEL_OPTIONS} -fno-reciprocal-math) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's study the NVCC behaviors first. I think reciprocal-math is a general optimization, and most the compilers enable it by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EikanWang I believe --prec-div https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html?highlight=reciprocal#prec-div-true-false-prec-div is the corresponding NVCC flag by default set to True unless --use_fast_math (https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html?highlight=reciprocal#use-fast-math-use-fast-math) is set to True. I do not believe pytorch is built with --use_fast_math correct me if I'm wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the information. Some HW platforms like CPU turn on this option by default. For CUDA, it keeps its default, saying use_fast_math is False. So, it makes sense to add no-reciprocal-math. I'm curious why SYCL turns on this option by default.
@fengyuan14 do you know the background?
torch-xpu-ops/cmake/BuildFlags.cmake
Line 77 in 20baf08
| # The fast-math will be enabled by default in SYCL compiler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per my understanding, we should disable fast-math to align with CUDA
|
There are a few options regarding |
@SanityRemnants , Although the performance is the king, I'd prefer to keep a high bar of maintenance when we have no concrete performance impact.
|
There is no such issue on CUDA. This parameter is neither in IPEX or in PyTorch. We are waiting for this PR to fix related issue. Thank you! |
|
Hi Eikan, we collect the op benchmark performance before and after this PR . For inference with fp16 dtype, in 22873 op cases, we found about 154 op cases with a ratio <0.9, 455 cases with a ratio <0.95, and the rest achieved a ratio of 95%-100%. Some of the relevant cases can be found in element-wise operations and softmax cases. These are the lowest cases: <style> </style>
|
|
@weishi-deng , I'm curious why the option impacts the other operations besides div. Let understand why. |
@NeoZhangJianyu , Any E2E accuracy issues due to this option? |
dbfbfcb to
0101740
Compare
@NeoZhangJianyu pls. keep my comment as it is and address your questions by quoting my comment. |
|
@weishi-deng , |
|
@SanityRemnants , the PR looks good to me. However, the performance of |
Added -fno-reciprocal-math to SYCL_KERNEL_OPTIONS to eliminate issues with division of the number by itself returning non 1 output which caused: #1895.