-
Notifications
You must be signed in to change notification settings - Fork 1.1k
fix: Optimize GetRandomChain() in DenseSet #6033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dranikpg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clever. Please change the bit gen in GetRandomIterator as well
src/core/dense_set.cc
Outdated
| // Use thread-local generator to avoid repeated construction overhead | ||
| thread_local absl::BitGen gen; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just using absl::InsecureBitGen is as cheap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
"For sparse tables (6% util): ~87% success rate with 32 probes -> avoids expensive linear scan" |
|
Please open a separate issue implementing |
|
Clever indeed. |
|
If we have a uniform distribution and if we start from random point, we should get the same complexity for linear scan as for probing, but linear scan should be faster because of memory spatial locality |
BorysTheDev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
random probing can't solve this issue
With string hashes the distribution should be close to uniform. If to think about just theoretically for a single operation, probing one more cell with a linear scan shouldn't be different from doing random probes. Except that if we always probe linarly, then we create "islands" of values, as there is more blank space to land in and we'll only bite off the next leftmost value of the next "island". So maybe with linear probing a large hole is created that only grows on average... At least this is my theory. Any full solution is just shrinking the backing array, as there's just not enough information for efficiently searching a few leftover entries in a large backing |
If we start linear scan from random point every time, there shouldn't be any islands and result should be the same as random probing. I think the problem is somewhere else, for example, our random generator could provide a non-uniform distribution or somehow we don't remove items in the beginning or something else. Anyway without understanding the real problem, mathematically this solution is incorrect |
|
I think having 5000x improvement justifies the fix. Having said that, I agree that it would be nice to understand WHY the main branch is "5000x" slower. |
I don't say that there is no improvement. I say that the current fix isn't optimal and uses incorrect assumptions. And the results that we see have another explanation. I think the problem was fixed with some random change, but not with probing. |
|
Outside of discussion about randomness. @vyavdoshenko did you consider also rewriting function to use bit operations instead of modulo. i.e. |
|
The simpliest solution is the best.
|
| state.ResumeTiming(); | ||
| for (int i = 0; i < 1000; ++i) { | ||
| src.Pop(); | ||
| tmp.Pop(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check that we always pop 1000 elements or that we always pop element - whatever is easier. Otherwise LGTM.
bd49627 to
ed064fc
Compare
src/core/dense_set.cc
Outdated
| } | ||
|
|
||
| // Use thread-local generator to avoid construction overhead | ||
| thread_local absl::InsecureBitGen gen; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use the same thread_local object here and at line 710
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
I remember seeing differrent numbers for the benchmark. While this improvement is good, it does not solve the problem for over sized sets. |
| constexpr size_t kMinSize = 1 << kMinSizeShift; | ||
| constexpr bool kAllowDisplacements = true; | ||
|
|
||
| thread_local absl::InsecureBitGen tl_bit_gen; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe creating the bitgen is not expensive
Relevant to: #6016
Improves SPOP performance on sparse hash tables (created by Reserve()) through three key optimizations:
thread_local RNG: Use thread_local absl::InsecureBitGen instead of
creating a new BitGen on each call, eliminating construction overhead
Iterator increment: Replace modulo operation with iterator increment
and wrap-around check, avoiding expensive division in the hot loop
IsEmpty() early check: Test IsEmpty() before calling ExpireIfNeeded()
to skip expensive expiration checks on empty buckets
Comparison:
Main branch:
Fixed version (current):
Previous approach (wrong direction) (Rejection Sampling):
The current comparison is based on fixed benchmark: #6038