Optimize lookup for any descendant leaf#74
Merged
Conversation
Instead of using `minimum_unchecked` to find an arbitrary descendant, instead use a custom function that tries to quickly find any leaf of a given subtree. The default implementation for inner nodes is equivalent to the `minimum_unchecked` function, but is override for the `InnerNodeSorted` and `InnerNodeIndirect`. Both of these inner node types maintain a compact array of child pointers, which makes it easy to select a child node at random. Rather than picking randomly, we're trying to find a leaf node as quickly as possible. I'd previously read <https://brooker.co.za/blog/2012/01/17/two-random.html>, which gave me the idea to try a best-of-two strategy for looking for leaf node child pointers. I couldn't say that this is the very best option, only that it is easily proven correct (there are always two child nodes in an inner node) and that testing later proved that it was better than a single choice. I wanted to do this optimization when I started reworking the range iterator a little while ago. Specifically, the range operation needs to do a "full prefix" search (as opposed to a pessimistic/optimistic prefix-based search), which requires searching for a leaf node if a given inner node has implicit bytes in the stored prefix. "Full prefix" searches also happen in the insert code path. Looking at the "full prefix" search code, it occured to me that using the minimum as a way to find an arbitrary leaf node was a pretty good option, but not the best. So I tried out the best-of-two stuff realized that the overall improvement more most cases was marginal at best. However, for one specific dataset and operation this was a huge improvement: ```text iai_callgrind::bench_insert_group::bench_from_iter skewed:... Baselines: |9b221e2 Instructions: 211822957|512411353 (-58.6615%) [-2.41905x] L1 Hits: 283167619|616345853 (-54.0570%) [-2.17661x] LL Hits: 5886|34218438 (-99.9828%) [-5813.53x] RAM Hits: 551645|551715 (-0.01269%) [-1.00013x] Total read+write: 283725150|651116006 (-56.4248%) [-2.29488x] Estimated Cycles: 302504624|806748068 (-62.5032%) [-2.66689x] ``` `-60%`!!! It was consistent over multiple runs too. The reason for this massive improvement is the structure of the `skewed` dataset. `skewed` keys are an artificial dataset I use for testing, here is some example data: ```text [255] [0, 255] [0, 0, 255] [0, 0, 0, 255] [0, 0, 0, 0, 255] [0, 0, 0, 0, 0, 255] [0, 0, 0, 0, 0, 0, 255] [0, 0, 0, 0, 0, 0, 0, 255] [0, 0, 0, 0, 0, 0, 0, 0, 255] [0, 0, 0, 0, 0, 0, 0, 0, 0, 255] ``` When inserted into the tree structure, this sequence of keys is a worst case for lookup/insertion/deletion/etc because the number of inner nodes is maximized. The insert operation is especially bad because on each step of the lookup portion of the operation (finding where to insert) we need to see if there is a "full prefix" mismatch. The "full prefix" mismatch requires going to find a descendant leaf node, which requires recursing down the whole long tree. However, this optimization specifically prevents this because it checks two children at each inner node and prefers the one that points to a leaf node. That effectively makes the "full prefix" lookup constant time (for this specific key dataset). Overall, I don't think this optimization is hugely important, though it was fun to investigate. I'll probably keep it because it helps the skewed edge case performance a ton and doesn't hurt other datasets much at all. Passed all existing tests, no new tests.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Instead of using
minimum_uncheckedto find an arbitrary descendant,instead use a custom function that tries to quickly find any leaf of
a given subtree.
The default implementation for inner nodes is equivalent to the
minimum_uncheckedfunction, but is override for theInnerNodeSortedand
InnerNodeIndirect. Both of these inner node types maintaina compact array of child pointers, which makes it easy to select
a child node at random.
Rather than picking randomly, we're trying to find a leaf node as
quickly as possible. I'd previously read
https://brooker.co.za/blog/2012/01/17/two-random.html, which gave
me the idea to try a best-of-two strategy for looking for leaf node
child pointers. I couldn't say that this is the very best option,
only that it is easily proven correct (there are always two child
nodes in an inner node) and that testing later proved that it was
better than a single choice.
I wanted to do this optimization when I started reworking the range
iterator a little while ago. Specifically, the range operation needs
to do a "full prefix" search (as opposed to a pessimistic/optimistic
prefix-based search), which requires searching for a leaf node if a
given inner node has implicit bytes in the stored prefix. "Full
prefix" searches also happen in the insert code path.
Looking at the "full prefix" search code, it occured to me that using
the minimum as a way to find an arbitrary leaf node was a pretty good
option, but not the best. So I tried out the best-of-two stuff
realized that the overall improvement more most cases was marginal at
best. However, for one specific dataset and operation this was a huge
improvement:
-60%!!! It was consistent over multiple runs too. The reason forthis massive improvement is the structure of the
skeweddataset.skewedkeys are an artificial dataset I use for testing, here issome example data:
When inserted into the tree structure, this sequence of keys is a
worst case for lookup/insertion/deletion/etc because the number of
inner nodes is maximized. The insert operation is especially bad
because on each step of the lookup portion of the operation (finding
where to insert) we need to see if there is a "full prefix" mismatch.
The "full prefix" mismatch requires going to find a descendant leaf
node, which requires recursing down the whole long tree. However,
this optimization specifically prevents this because it checks two
children at each inner node and prefers the one that points to a
leaf node. That effectively makes the "full prefix" lookup constant
time (for this specific key dataset).
Overall, I don't think this optimization is hugely important, though
it was fun to investigate. I'll probably keep it because it helps the
skewed edge case performance a ton and doesn't hurt other datasets
much at all.
Passed all existing tests, no new tests.