Skip to content

Conversation

@kumarUjjawal
Copy link
Contributor

@kumarUjjawal kumarUjjawal commented Dec 15, 2025

Which issue does this PR close?

Rationale for this change

There are several Spark functions which have the equivalent Datafusion functions, our goal is to reduce this duplication.

What changes are included in this PR?

  • Spark LIKE keeps its wrapper but now uses DF’s like_coercion and delegates execution to arrow::compute::like

Are these changes tested?

All previouse tests pass

Are there any user-facing changes?

@github-actions github-actions bot added the spark label Dec 15, 2025
@kumarUjjawal
Copy link
Contributor Author

@Jefffrey I need some clarification about this change, according to your issue #17964, you wanted to use the DF's equivalent function to the Spark. But DataFusion doesn’t expose LIKE as a reusable datafusion_functions. LIKE is implemented as a built-in operator/physical expression. Is the goal to create shared UDF for LIKE.


fn coerce_types(&self, arg_types: &[DataType]) -> Result<Vec<DataType>> {
match (arg_types.first(), arg_types.get(1)) {
(Some(lhs), Some(rhs)) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This arm will match even if there are more than two arguments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Thanks!

@Jefffrey
Copy link
Contributor

In this case it seems like there is quite minimal duplication since the Spark version uses the arrow like kernel directly; though I wonder if there would be any benefit to implementing simplify for the Spark like function here to reduce it to DataFusion like operator? 🤔

@kumarUjjawal
Copy link
Contributor Author

I didnt' use the simplify because the DF LIKE is non-nullable when both inputs are non-null #19257 and the DF planner rejects escape chars other than backslash; the Spark UDF currently calls Arrow directly without that check. So we would have to handle both of these or crete a share LIKE func before we can use the simplify like the Datafusion.

@Jefffrey
Copy link
Contributor

I didnt' use the simplify because the DF LIKE is non-nullable when both inputs are non-null #19257 and the DF planner rejects escape chars other than backslash; the Spark UDF currently calls Arrow directly without that check. So we would have to handle both of these or crete a share LIKE func before we can use the simplify like the Datafusion.

If that's the case then I think we can leave the Spark Like code on main as is without any changes 👍

@kumarUjjawal
Copy link
Contributor Author

Good call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants