-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Improve compression performance by conditionally skipping lazy evaluation #4533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
…tion Optional ZSTD_COMPRESS_FAST flag provided to speed up compression by restricting lazy evaluation to short matches. Applied to levels 6 and 7. This helps improve lazy compression strategy. Skip determinism test when enabled. Optimization disabled by default.
|
Hi @BiplabRaut! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
Cyan4973
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The core idea seems sound: do not search for a better match when you already have a "good enough" one.
This will save some work, and consequently reduce the amount of cpu spent in search.
It will also reduce opportunities to find better matches, and therefore negatively impact compression ratio.
So it's not a clear win situation, and more a trade off.
Therefore, what matters is the relative sizes of these trade off: how much is won, how much is lost.
And when looking at the numbers provided here, it does not look great.
The announced compression speed gains are significant, in the double digit range.
But we also take reduction in compression ratio pretty seriously. Something < 0.1% could be considered roughly negligible. But anything closer to ~1% is significant. And there are a few cases where compression ratio is negatively impacted by a significant amount.
I guess we could have a discussion here, about the pros and cons of the trade off, and maybe find another compromise point.
The second part though is about code complexity.
The core of the concept is not difficult, it adds a branch within the search loop, in ZSTD_compressBlock_lazy_generic().
But unfortunately, the current implementation also adds a ton of other changes in other parts of the source code.
This is more difficult to maintain.
And considering the impact of the change, which merely offers a different position in the trade-off curve, I feel this is not a reasonable proposition. We need to keep things simpler if we want the source code to remain maintainable in the future.
As it stands, this PR is only suitable for a dedicated fork, where an independant maintenance policy can be set.
| { 21, 16, 17, 1, 5, 0, ZSTD_dfast }, /* level 3 */ | ||
| { 21, 18, 18, 1, 5, 0, ZSTD_dfast }, /* level 4 */ | ||
| { 21, 18, 19, 3, 5, 2, ZSTD_greedy }, /* level 5 */ | ||
| #ifdef ZSTD_ENABLE_COMPRESS_FAST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is too disruptive, and will make maintenance of these tables more complex for future patches.
| For this scenario, it can be set as `ZDICT_QSORT=ZDICT_QSORT_C90`. | ||
| Other selectable suffixes are `_GNU`, `_APPLE`, `_MSVC` and `_C11`. | ||
|
|
||
| - The build macro `ZSTD_COMPRESS_FAST` can be defined for fast lazy evaluation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The macro name is not specific enough. The name implies that this is about the "fast mode", while the changes are actually impacting the ZSTD_lazy strategy. Consider a more accurate name, or better, if we can find just a better trade off, maybe there would be no need for another build macro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also see that there is a confusion between a C macro and a Makefile macro.
They are not the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we will address this
| /* params are supposed to be fully validated at this point */ | ||
| assert(!ZSTD_isError(ZSTD_checkCParams(params->cParams))); | ||
| #ifdef ZSTD_ENABLE_COMPRESS_FAST | ||
| compressionLevel = (params->compressionLevel == 0) ? cctx->requestedParams.compressionLevel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create a block which both starts and end compressionLevel lifetime.
| #ifdef ZSTD_ENABLE_COMPRESS_FAST | ||
| /* Lazy evaluation is performed only if the first match length | ||
| * is <= ZSTD_compressFastLazyLimit */ | ||
| #define ZSTD_COMPRESS_FAST_BASE_LAZY_LIMIT 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems way too low.
In fact, in many cases, the search is targeting 5-byte matches, so all matches would trigger this condition, effectively changing _lazy into _greedy.
| size_t maxNbSeq; | ||
| size_t maxNbLit; | ||
| #ifdef ZSTD_ENABLE_COMPRESS_FAST | ||
| size_t lazyLimit; /* Match length limit to allow lazy evaluation */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unwelcome, I would prefer we avoid adding a conditional variable here.
| prefixLowestIndex - (U32)(dictEnd - dictBase) : | ||
| 0; | ||
| const U32 dictAndPrefixLength = (U32)((ip - prefixLowest) + (dictEnd - dictLowest)); | ||
| #ifdef ZSTD_ENABLE_COMPRESS_FAST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too many such #ifdef blocks across the code base
| fi | ||
|
|
||
| if zstd -vv --version | grep -q 'non-deterministic'; then | ||
| if zstd -vv --version | grep -q 'non-deterministic' || [ "$ZSTD_COMPRESS_FAST" = "1" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing determinism is pretty wild, though I guess it might be acceptable if this setting must be explicitly opt-in.
Thank you Yann for your detailed review comments. Broadly, there are two main points:-
Thanks again. |
Optional ZSTD_COMPRESS_FAST flag provided to speed up compression by restricting lazy evaluation to short matches.
Applied to levels 6 and 7. This helps improve lazy compression strategy. Optimization disabled by default.
This code change improves compression performance with little to no tradeoff for compression ratio and decompression performance. It is useful for applications that have high compression speed requirements.
Skip determinism test when this optimization option is enabled. Since the compression ratio varies, so it can't satisfy the determinism tests.
Performance Highlights :-
Benchmark result (single-threaded execution) graphs are shared below for the datasets: Silesia, Calgary, Canterbury and FreeBSD.
Other datasets like geekBench and Enwik showed similar trends.
Test Setup: AMD Genoa (Zen4) CPU, SMT OFF, GCC 14.2.1