Skip to content

Conversation

@Polaris-911
Copy link
Contributor

Zicclsm: Main memory supports misaligned loads/stores
According to the RVA20U64 specification, the Zicclsm extension is mandatory and is supported in gcc versions 14.1 and above.
References
GCC Zicclsm
RVA20U64 specification

Performance Test Results: zstd with Different MEM_FORCE_MEMORY_ACCESS Settings

Test Environment

[root@r2044-r1-s1 ~]# dmidecode -t processor | grep "Version"
        Version: SG2044
[root@r2044-r1-s1 ~]# uname -a
Linux r2044-r1-s1 6.12.47-25.09.16.17.riscv64 #1 SMP Tue Sep 16 17:47:24 CST 2025 riscv64 riscv64 riscv64 GNU/Linux
Compressor Metric MEM_FORCE_MEMORY_ACCESS=2  MEM_FORCE_MEMORY_ACCESS=1  Improvement Ratio*
zstd 1.5.7 -1 Compression Speed 72.0 MB/s 57.5 MB/s ~25.2%
zstd 1.5.7 -1 Decompression Speed 93.7 MB/s 93.4 MB/s ~0.3%
zstd 1.5.7 -22 Compression Speed 0.24 MB/s 0.22 MB/s ~9.1%
zstd 1.5.7 -22 Decompression Speed 71.0 MB/s 66.8 MB/s ~6.3%
  1. MEM_FORCE_MEMORY_ACCESS=1
[root@r2044-r1-s1 lzbench-master]#./lzbench -t0,0 -i5,5 -ezstd,1,22 ../silesia.tar
lzbench 2.1 | GCC 12.3.1 | 64-bit Linux |

Compressor name         Compress. Decompress. Compr. size  Ratio Filename
zstd 1.5.7 -1            57.5 MB/s  93.4 MB/s    73216302  34.54 ../silesia.tar
zstd 1.5.7 -22           0.22 MB/s  66.8 MB/s    52222248  24.64 ../silesia.tar
[Params] cIters=5 dIters=5 cTime=0.0 dTime=0.0 chunkSize=0KB cSpeed=0MB
  1. MEM_FORCE_MEMORY_ACCESS=2
[root@r2044-r1-s1 lzbench-master]#./lzbench -t0,0 -i5,5 -ezstd,1,22 ../silesia.tar
lzbench 2.1 | GCC 12.3.1 | 64-bit Linux |

Compressor name         Compress. Decompress. Compr. size  Ratio Filename
zstd 1.5.7 -1            72.0 MB/s  93.7 MB/s    73216302  34.54 ../silesia.tar
zstd 1.5.7 -22           0.24 MB/s  71.0 MB/s    52222248  24.64 ../silesia.tar
[Params] cIters=5 dIters=5 cTime=0.0 dTime=0.0 chunkSize=0KB cSpeed=0MB

@meta-cla meta-cla bot added the CLA Signed label Nov 3, 2025
@Polaris-911
Copy link
Contributor Author

Hi @Cyan4973 , I know you're busy—just wanted to check if you could spare a moment to review this PR. Thanks in advance!

@Cyan4973 Cyan4973 self-assigned this Dec 2, 2025
Copy link
Contributor

@Cyan4973 Cyan4973 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We recommend retaining Method 2 solely as a "last resort" to force enable unaligned memory access on a local system.

However, we do not endorse its use "in general".

Method 2 essentially misleads the C virtual machine by asserting that memory addresses are aligned when, in reality, they are not. This constitutes undefined behavior (UB), and as such, we cannot guarantee reliable or predictable results.

Consequently, we are unable to approve this pull request in its current form.

The preferred and correct approach is Method 0, which is fully portable.

If the compiler recognizes that the target CPU supports unaligned memory access, it should optimize memcpy(d, s, 8) into a single read or write instruction. If this optimization does not occur, the issue lies with the compiler's optimization capabilities.

If that's the current situation regarding RISC-V, I would recommend pursuing improvements in compiler optimization to achieve a more robust and future-proof solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants