-
Notifications
You must be signed in to change notification settings - Fork 264
perf: Optimize startsWith and endsWith string functions #3000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Implements a hybrid optimization strategy for string primitives. - Uses Arrow compute kernels for startsWith with pre-allocated pattern arrays to avoid per-batch allocation overhead. - Uses direct buffer access and manual suffix calculation for endsWith to bypass iterator overhead and match JVM intrinsic performance. - Achieves 1.1X speedup for startsWith and parity (1.0X) for endsWith compared to Spark. Closes apache#2973
|
Thank you for the PR @Mazen-Ghanaym . Any reason why we can't make the Datafusion's version faster here? |
I spent around 4 days trying to optimize, and here is the short story of the journey Day 2: Tried unsafe Rust with direct byte access Attempted raw pointer manipulation for maximum speed. Failed due to compatibility issues. Day 3: Tried safe Rust with stdlib slices. Used Day 3-4: Arrow compute kernels + pre-allocated pattern The breakthrough: I realized the overhead came from repeatedly processing the pattern each batch. Pre-allocating the pattern as a For I tried to optimize further but I don't know any optimizations that can beat Java in these direct, simple operations. |
|
I will start reviewing this PR today. There was a PR merged into DataFusion last week to speed up |
| let offsets = string_array.value_offsets(); | ||
| let values = string_array.value_data(); | ||
| let pattern_bytes = self.pattern.as_bytes(); | ||
| let p_len = self.pattern_len; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we just use Arrow's ends_with kernel, similar to how starts_with is handled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually tried that first! Used arrow::compute::ends_with with the same Scalar approach as starts_with, but it was still ~0.9X vs Spark at the time. The direct buffer access is what got it to 1.0X parity.
I saw the DataFusion PR you linked (apache/datafusion#19516), looks like it adds the same Scalar optimization. Once Comet picks up that version, happy to switch this to the standard kernel and remove the custom code!
| // Use Arrow's highly optimized SIMD kernel | ||
| let result = compute::starts_with(&array, &scalar)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3000 +/- ##
============================================
+ Coverage 56.12% 59.62% +3.50%
- Complexity 976 1377 +401
============================================
Files 119 167 +48
Lines 11743 15509 +3766
Branches 2251 2569 +318
============================================
+ Hits 6591 9248 +2657
- Misses 4012 4962 +950
- Partials 1140 1299 +159 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Which issue does this PR close?
Closes #2973.
Rationale for this change
The
startsWithandendsWithstring functions were previously delegated to DataFusion's built-in scalar functions, which introduced unnecessary overhead and did not fully leverage Comet's native execution capabilities. This PR implements optimized native expressions to improve performance.What changes are included in this PR?
This PR introduces custom StartsWithExpr and EndsWithExpr physical expressions with the following optimizations:
startsWith:compute::starts_withkernel with a pre-allocated pattern array to avoid per-batch allocations.endsWith:StringArraydata, bypassing iterator overhead.memcmp).Files Changed:
How are these changes tested?
CometStringExpressionBenchmark:startsWith: 1.1X faster than Spark (Comet 1887ms vs Spark 2028ms)endsWith: 1.0X parity with Spark (Comet 3389ms vs Spark 3354ms)Benchmark Results
Environment: OpenJDK 64-Bit Server VM 11.0.29+7-LTS on Linux 6.11.0-1018-azure
Processor: AMD EPYC 7763 64-Core Processor
startsWith
endsWith
Summary:
startsWith: 1.1X faster than Spark (1546ms vs 1657ms)endsWith: 1.0X parity with Spark (1562ms vs 1625ms)