refactor: refactor spark like function to use datafusion like #19324

kumarUjjawal · 2025-12-15T07:00:12Z

Which issue does this PR close?

Part of Deduplicate Spark function code with native/default datafusion function code #17964

Rationale for this change

There are several Spark functions which have the equivalent Datafusion functions, our goal is to reduce this duplication.

What changes are included in this PR?

Spark LIKE keeps its wrapper but now uses DF’s like_coercion and delegates execution to arrow::compute::like

Are these changes tested?

All previouse tests pass

Are there any user-facing changes?

kumarUjjawal · 2025-12-15T07:25:44Z

@Jefffrey I need some clarification about this change, according to your issue #17964, you wanted to use the DF's equivalent function to the Spark. But DataFusion doesn’t expose LIKE as a reusable datafusion_functions. LIKE is implemented as a built-in operator/physical expression. Is the goal to create shared UDF for LIKE.

martin-g · 2025-12-15T08:06:14Z

datafusion/spark/src/function/string/like.rs

+
+    fn coerce_types(&self, arg_types: &[DataType]) -> Result<Vec<DataType>> {
+        match (arg_types.first(), arg_types.get(1)) {
+            (Some(lhs), Some(rhs)) => {


This arm will match even if there are more than two arguments

Good catch. Thanks!

Jefffrey · 2025-12-15T11:00:09Z

In this case it seems like there is quite minimal duplication since the Spark version uses the arrow like kernel directly; though I wonder if there would be any benefit to implementing simplify for the Spark like function here to reduce it to DataFusion like operator? 🤔

kumarUjjawal · 2025-12-15T11:25:20Z

I didnt' use the simplify because the DF LIKE is non-nullable when both inputs are non-null #19257 and the DF planner rejects escape chars other than backslash; the Spark UDF currently calls Arrow directly without that check. So we would have to handle both of these or crete a share LIKE func before we can use the simplify like the Datafusion.

Jefffrey · 2025-12-15T11:43:17Z

I didnt' use the simplify because the DF LIKE is non-nullable when both inputs are non-null #19257 and the DF planner rejects escape chars other than backslash; the Spark UDF currently calls Arrow directly without that check. So we would have to handle both of these or crete a share LIKE func before we can use the simplify like the Datafusion.

If that's the case then I think we can leave the Spark Like code on main as is without any changes 👍

kumarUjjawal · 2025-12-15T12:09:20Z

Good call.

kumarUjjawal added 2 commits December 15, 2025 12:19

refactor: refactor spark like function to use datafusion like

aabe1f3

updated comment

8d9d9e9

github-actions bot added the spark label Dec 15, 2025

martin-g reviewed Dec 15, 2025

View reviewed changes

restrain matching of more than two arms

9ab0a1d

Jefffrey closed this Dec 15, 2025

Jefffrey mentioned this pull request Dec 15, 2025

Deduplicate Spark function code with native/default datafusion function code #17964

Open

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: refactor spark like function to use datafusion like #19324

refactor: refactor spark like function to use datafusion like #19324

Uh oh!

kumarUjjawal commented Dec 15, 2025 •

edited

Loading

Uh oh!

kumarUjjawal commented Dec 15, 2025

Uh oh!

martin-g Dec 15, 2025

Uh oh!

kumarUjjawal Dec 15, 2025

Uh oh!

Jefffrey commented Dec 15, 2025

Uh oh!

kumarUjjawal commented Dec 15, 2025

Uh oh!

Jefffrey commented Dec 15, 2025

Uh oh!

kumarUjjawal commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

refactor: refactor spark like function to use datafusion like #19324

refactor: refactor spark like function to use datafusion like #19324

Uh oh!

Conversation

kumarUjjawal commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

kumarUjjawal commented Dec 15, 2025

Uh oh!

martin-g Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

kumarUjjawal Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Jefffrey commented Dec 15, 2025

Uh oh!

kumarUjjawal commented Dec 15, 2025

Uh oh!

Jefffrey commented Dec 15, 2025

Uh oh!

kumarUjjawal commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kumarUjjawal commented Dec 15, 2025 •

edited

Loading