-
Notifications
You must be signed in to change notification settings - Fork 3.9k
ARROW-11511: [Rust] Replace Arc<ArrayData> by ArrayData in all arrays
#9329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Arc<ArrayData> by ArrayData in all arraysArc<ArrayData> by ArrayData in all arrays (1.2-2x speedup)
|
cc @Dandandan ^_^ |
|
Nice @jorgecarleitao !! Looking forward to see the impact in DataFusion on the slicing of smaller arrays and maybe other parts as well. I wouldn't have expected > 20% changes for bigger arrays (512/1024 elements) in the kernels only based on removing / not creating or cloning the Arc. Any idea why that could be the case? Ergonomically it is definitely an improvement, the ArrayData struct is small and should be cheap to clone, this avoids quite some boilerplate. |
Arc<ArrayData> by ArrayData in all arrays (1.2-2x speedup)Arc<ArrayData> by ArrayData in all arrays
I remember that we had a similar issue some time ago where we were using My general approach here is to write simpler code in the hope that it helps the compiler. In practice, I humbly make these offerings to the gods of LLVM and pray for a sign. |
|
@jorgecarleitao that's an interesting idea; simplification makes sense, especially if cloning I ran the filter benchmarks with SIMD enabled and got the following results (in comparison to master). filter u8 time: [556.89 us 562.29 us 568.54 us] filter u8 high selectivity filter u8 low selectivity filter context u8 time: [205.24 us 209.06 us 213.16 us] filter context u8 high selectivity filter context u8 low selectivity filter context u8 w NULLs filter context u8 w NULLs high selectivity filter context u8 w NULLs low selectivity filter f32 time: [779.40 us 789.32 us 801.26 us] filter context f32 time: [485.95 us 495.16 us 506.07 us] filter context f32 high selectivity filter context f32 low selectivity filter context string time: [639.51 us 646.56 us 655.15 us] filter context string high selectivity filter context string low selectivity |
rust/arrow/src/array/array.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if both data() and data_ref() return &ArrayData may be just data_ref() is enough and data() should be removed; or possibly change data() to return ArrayData
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand why we wanted to avoid Array::data().clone() to get ArrayData, but with these 2 functions returning the same reference, I'd opt to make Array::data() -> ArrayData
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yordan-pavlov I think it's safer to deprecate data(), then remove it later. We still use it a lot in the codebase, but it's often more performant to use array_ref(), so returning ArrayData doesn't guide users on a faster path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, deprecate data() is what I meant, immediate removal is not great for backwards compatibility
|
How cheap cloning of the ArrayData struct is probably depends a lot on the datatype. The removed indirection seems to be beneficial, but instead we now need to clone all fields, which include
So for primitive types, removing the arc should be a benefit because of less indirection when accessing it, it would still need to allocates vectors and clone the Arc which is inside the Buffer. For more complex types, cloning the DataType enum itself and multiple buffers could be slower than the speedup gained by less indirection. |
I am sorry, I am not sure I follow the arguments:
|
You are right, I wasn't looking closely enough at the code or thought only about |
Arc<ArrayData> by ArrayData in all arraysArc<ArrayData> by ArrayData in all arrays
Codecov Report
@@ Coverage Diff @@
## master #9329 +/- ##
==========================================
- Coverage 82.32% 82.03% -0.29%
==========================================
Files 245 245
Lines 56277 55927 -350
==========================================
- Hits 46328 45879 -449
- Misses 9949 10048 +99
Continue to review full report at Codecov.
|
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went over this PR carefully and it makes sense to me. I am happy with the performance numbers reported by others, though if we want another opinion I can re-run stuff locally.
My opinion is that if this PR slows anything down, there are other optimizations we should pursue to get the performance back (Arcing compound datatypes and special casing small numbers of Buffers in ArrayData) that would also help the overall system
still think there might be some overhead there and it might be worth experimenting with an Arc or replacing some of the Box in the DataType enum with Arc in a separate PR. For nested types like List<Dictionary<Int32, Utf8>> I think cloning the type currently involves more allocations than cloning the vec of buffers/child data.
I agree with this @jhorstmann -- I think a lot of the arrow codebase assumes that Cloneing DataType is cheap (which is true for primitives and not true for compound types).
Also, given that ArrayData has several Vecs in it, I suspect an optimization like @jhorstmann was trying here #9330 would make a significant difference too (as most arrays have 1 or 0 items in that Vec)
In practice, I humbly make these offerings to the gods of LLVM and pray for a sign.
I think this is a good sacrifice. 😂
rust/arrow/src/array/array_union.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the reasoning for removing this size test rather than updating it for the new values ?
nevi-me
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one question
d4608a9 to
356c300
Compare
|
@jorgecarleitao shall we merge this PR? Or is it still an "another of those experiments to gather feedback and share results." in the description? |
Merging #9329 did not go as well as I planned; Fixing error in master. Closes #9511 from jorgecarleitao/fix_error Authored-by: Jorge C. Leitao <[email protected]> Signed-off-by: Jorge C. Leitao <[email protected]>
|
I think this one needs a rebase (Jorge fixed the real error in #9511) |
nevi-me
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorgecarleitao @alamb I've rebased this and fixed some errors in simd.
I have capacity to look at this in more detail, including running benchmarks; so I'll start doing that.
There's 3 questions that Jorge asked, which I haven't answered; so I'm marking this as "changes requested" to avoid us prematurely merging this.
I'd also like to hear from @sunchao and @vertexclique around what the change could mean from a memory-usage. I don't have enough experience to tell off-hand.
I'll keep this PR rebased, if there's merge conflicts that arise; so I can take that burden off your shoulders Jorge.
|
Thanks @nevi-me . Let me know if there is anything I can do to help |
|
Benchmarks on an ARM machine: Only equality benches are worse, but not by that much. |
I think it comes down just to memory size, as cloning
I personally don't. The underlying bytes are still backed by
I attached my set of benchmark results, but I think the datafusion benchmarks would be more meaningful. Thanks |
nevi-me
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with continuing with this change. I've rebased the PR, and added comments on the questions that @jorgecarleitao asked.
Any thoughts @alamb, Jorge, @andygrove?
I've been out of the loop, so I'm not sure if an RFC is retrospectively required for the change, as the PR predates the RFC suggestion.
|
@nevi-me I think we should merge this PR:
|
|
I concur @alamb, I think the PR's been open long enough for those interested to comment. I fixed the clippy lint already in master, so we need not worry about it. |
Thanks -- you beat me to it! (#9754) |
|
@nevi-me performance on some queries in here h2oai/db-benchmark#182 improve by ~5%. |
|
Thanks @nevi-me for pushing the over the line. 🎉 |
These tests were all commented out in #9329. Given the PR made changes to the commented out code, I'm inclined to think this was an accidental omission? If not, happy for this to be closed 😀 FYI @jorgecarleitao Signed-off-by: Raphael Taylor-Davies <[email protected]> Closes #10048 from tustvold/enable-transform-tests Authored-by: Raphael Taylor-Davies <[email protected]> Signed-off-by: Andrew Lamb <[email protected]>
Rational
Currently, all our arrays use an
Arc<ArrayData>, which they expose viaArray::dataandArray::data_ref. This adds a level of indirection. Now, it happens that, afaik, in the current code baseArc<>is not needed.On #9271 we are observing some performace issues with small arrays, and one of the ideas that came up was to get rid of
Arcand see what happens.This PR
Well, this PR replaces all
Arc<ArrayData>byArrayData. On the one hand, this means that cloning an array is a tad more expensive (ArcvsArrayData). On the other hand, it means that often the compiler can optimize out.The gist of the benchmarks below is:
takeThere is some noise, as there are benches that are not expected to be affected and are being affected.
Personally, I like this PR because it makes working with
ArrayDataand arrays simpler: no need forArc::neworas_refand company (besides the speed).questions
Arc<ArrayData>in all arrays?Arc?Benchmarks