Skip to content

TestRotatedBoxIou::test_iou[xyxyxyxy-float32] fails on arm64 Linux due to float32 precision in format round-trip #9499

@happyaron

Description

@happyaron

🐛 Describe the bug

Description

test_iou[xyxyxyxy-dtype0-cpu] fails on arm64 (aarch64) Linux with float32 dtype. The test passes on x86_64 and ppc64el.

The existing tolerance handling in the test accounts for macOS (atol=0.5) and CUDA (xfail), but the else fallback uses atol=1e-4 which is too tight for arm64 Linux where the observed absolute difference is 0.5058.

log.gz

Error

FAILED test_ops.py::TestRotatedBoxIou::test_iou[xyxyxyxy-dtype0-cpu]

Mismatched elements: 2 / 64 (3.1%)
Greatest absolute difference: 0.5057777166366577 at index (3, 0) (up to 0.0001 allowed)
Greatest relative difference: 0.5057777166366577 at index (3, 0) (up to 0.0001 allowed)

The cause

Guessed cause is that torch.cos and torch.sin on tensors use architecture-specific SIMD implementation. The Sleef NEON backend (arm64) produces slightly different float32 rounding than the AVX backend (x86_64), similar to the already handled macOS arm64 case which uses Apple's vDSP implementation.

Versions

PyTorch version: 2.12.0+debian
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux forky/sid (aarch64)
GCC version: (Debian 15.2.0-17) 15.2.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.42

Python version: 3.13.12 (main, Feb 4 2026, 15:06:39) [GCC 15.2.0] (64-bit runtime)
Python platform: Linux-6.12.88+deb13-arm64-aarch64-with-glibc2.42
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A

CPU:
Architecture: aarch64
CPU op-mode(s): 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
...
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm ssbs
L1d cache: 4 MiB (64 instances)
L1i cache: 4 MiB (64 instances)
L2 cache: 32 MiB (64 instances)
L3 cache: 64 MiB (2 instances)

Versions of relevant libraries:
[pip3] numpy==2.4.4
[pip3] torch==2.12.0+debian
[pip3] torchvision==0.27.0
[conda] Could not collect

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions