Skip to content

Conversation

@Mazen-Ghanaym
Copy link

Which issue does this PR close?

Closes #2973.

Rationale for this change

The startsWith and endsWith string functions were previously delegated to DataFusion's built-in scalar functions, which introduced unnecessary overhead and did not fully leverage Comet's native execution capabilities. This PR implements optimized native expressions to improve performance.

What changes are included in this PR?

This PR introduces custom StartsWithExpr and EndsWithExpr physical expressions with the following optimizations:

startsWith:

  • Uses Arrow's compute::starts_with kernel with a pre-allocated pattern array to avoid per-batch allocations.
  • Achieves 1.1X speedup over Spark.

endsWith:

  • Uses direct buffer access to the underlying StringArray data, bypassing iterator overhead.
  • Manually calculates suffix offsets and performs raw byte slice comparison (memcmp).
  • Achieves 1.0X parity with Spark (improved from 0.9X regression).

Files Changed:

  • native/spark-expr/src/string_funcs/starts_ends_with.rs (NEW)
  • native/spark-expr/src/string_funcs/mod.rs
  • native/core/src/execution/expressions/strings.rs
  • native/core/src/execution/planner/expression_registry.rs
  • spark/src/main/scala/org/apache/comet/serde/strings.scala
  • spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala
  • spark/src/test/scala/org/apache/spark/sql/benchmark/CometStringExpressionBenchmark.scala

How are these changes tested?

  1. Existing Tests: The implementation passes all existing Comet tests, including TPC-DS and TPC-H correctness suites which exercise string functions.
  2. Benchmark Verification: Performance was verified using CometStringExpressionBenchmark:
    • startsWith: 1.1X faster than Spark (Comet 1887ms vs Spark 2028ms)
    • endsWith: 1.0X parity with Spark (Comet 3389ms vs Spark 3354ms)
  3. CI Verification: A temporary workflow was used to verify the benchmark executes correctly in GitHub Actions CI environment and the results are in the Benchmark Results

Benchmark Results

Environment: OpenJDK 64-Bit Server VM 11.0.29+7-LTS on Linux 6.11.0-1018-azure
Processor: AMD EPYC 7763 64-Core Processor

startsWith

Case Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
Spark 1657 1669 17 0.6 1580.2 1.0X
Comet (Scan) 1740 1755 20 0.6 1659.6 1.0X
Comet (Scan + Exec) 1546 1546 1 0.7 1474.1 1.1X

endsWith

Case Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
Spark 1625 1632 10 0.6 1549.9 1.0X
Comet (Scan) 1731 1732 1 0.6 1651.2 0.9X
Comet (Scan + Exec) 1562 1563 0 0.7 1490.0 1.0X

Summary:

  • startsWith: 1.1X faster than Spark (1546ms vs 1657ms)
  • endsWith: 1.0X parity with Spark (1562ms vs 1625ms)

Mazen-Ghanaym and others added 6 commits December 27, 2025 03:32
- Implements a hybrid optimization strategy for string primitives.
- Uses Arrow compute kernels for startsWith with pre-allocated pattern arrays to avoid per-batch allocation overhead.
- Uses direct buffer access and manual suffix calculation for endsWith to bypass iterator overhead and match JVM intrinsic performance.
- Achieves 1.1X speedup for startsWith and parity (1.0X) for endsWith compared to Spark.

Closes apache#2973
@Mazen-Ghanaym Mazen-Ghanaym changed the title Feat/optimize strings 2973 feat: Optimize startsWith and endsWith string functions Dec 27, 2025
@coderfender
Copy link
Contributor

Thank you for the PR @Mazen-Ghanaym . Any reason why we can't make the Datafusion's version faster here?

@Mazen-Ghanaym
Copy link
Author

Thank you for the PR @Mazen-Ghanaym . Any reason why we can't make the Datafusion's version faster here?

I spent around 4 days trying to optimize, and here is the short story of the journey
Day 1-2: Tried DataFusion built-ins Started with ScalarFunctionExpr using DataFusion's native starts_with/ends_with. Result: 1.0X – just matched Spark, no improvement.

Day 2: Tried unsafe Rust with direct byte access Attempted raw pointer manipulation for maximum speed. Failed due to compatibility issues.

Day 3: Tried safe Rust with stdlib slices. Used .starts_with() and .ends_with() on string slices with manual iteration. Result: 0.9X – actually slower than Spark, I think it's due to iterator overhead.

Day 3-4: Arrow compute kernels + pre-allocated pattern The breakthrough: I realized the overhead came from repeatedly processing the pattern each batch. Pre-allocating the pattern as a StringArray once and calling Arrow's compute::starts_with directly gave us 1.1X for startsWith.

For endsWith, Arrow's kernel was still slightly slower (0.9X), so I went with direct buffer access and manual suffix calculation to reach 1.0X parity.

I tried to optimize further but I don't know any optimizations that can beat Java in these direct, simple operations.

@Mazen-Ghanaym Mazen-Ghanaym changed the title feat: Optimize startsWith and endsWith string functions perf: Optimize startsWith and endsWith string functions Dec 27, 2025
@andygrove
Copy link
Member

I will start reviewing this PR today.

There was a PR merged into DataFusion last week to speed up starts_with / ends_with there, so we may want to switch to that in the future.

apache/datafusion#19516

Comment on lines +203 to +206
let offsets = string_array.value_offsets();
let values = string_array.value_data();
let pattern_bytes = self.pattern.as_bytes();
let p_len = self.pattern_len;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we just use Arrow's ends_with kernel, similar to how starts_with is handled?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually tried that first! Used arrow::compute::ends_with with the same Scalar approach as starts_with, but it was still ~0.9X vs Spark at the time. The direct buffer access is what got it to 1.0X parity.

I saw the DataFusion PR you linked (apache/datafusion#19516), looks like it adds the same Scalar optimization. Once Comet picks up that version, happy to switch this to the standard kernel and remove the custom code!

Comment on lines +99 to +100
// Use Arrow's highly optimized SIMD kernel
let result = compute::starts_with(&array, &scalar)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@codecov-commenter
Copy link

codecov-commenter commented Jan 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.62%. Comparing base (f09f8af) to head (d2f0aea).
⚠️ Report is 811 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3000      +/-   ##
============================================
+ Coverage     56.12%   59.62%   +3.50%     
- Complexity      976     1377     +401     
============================================
  Files           119      167      +48     
  Lines         11743    15509    +3766     
  Branches       2251     2569     +318     
============================================
+ Hits           6591     9248    +2657     
- Misses         4012     4962     +950     
- Partials       1140     1299     +159     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve performance of startsWith and endsWith expressions

4 participants