Skip to content

Conversation

@Flyangz
Copy link
Contributor

@Flyangz Flyangz commented Jan 8, 2026

Which issue does this PR close?

Closes #1845

Rationale for this change

before: ProjectExec [UDFWrapper(split(col1#1, ,, -1)[0]) AS #5], schema=[#5:Utf8;N]
after: ProjectExec [(spark_ext_function_Spark_StringSplit(#1@0, ,)).[1] AS #5], schema=[#5:Utf8;N]

What changes are included in this PR?

Are there any user-facing changes?

How was this patch tested?

@github-actions github-actions bot added the spark label Jan 8, 2026
@richox
Copy link
Contributor

richox commented Jan 13, 2026

one concern: UDFWrapper(split, [1]) fallbacks one item from the result array. but UDFWrapper(split)[1] fallbacks the whole array, increasing a lot of fallback costs.

@Flyangz Flyangz force-pushed the feature/open-opt-split branch from b5ad7b2 to 496e7a3 Compare January 15, 2026 09:24
@Flyangz Flyangz force-pushed the feature/open-opt-split branch from 496e7a3 to 91f0f18 Compare January 15, 2026 09:41
@Flyangz Flyangz changed the title [AURON #1845] Support expression caching for multiple GetArrayItem on same array [AURON #1845] Fix StringSplit conversion Jan 16, 2026
@Flyangz Flyangz force-pushed the feature/open-opt-split branch from 91f0f18 to d56f165 Compare January 16, 2026 04:10
@github-actions github-actions bot removed the native label Jan 16, 2026
@Flyangz
Copy link
Contributor Author

Flyangz commented Jan 16, 2026

@richox Yes. I'll now break down the original issue into two steps. This PR fixes the problem of StringSplit failing to convert to Native. The other issue #1902 is the ext function caching problem. Completing these two steps will better resolve the original problem.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes the StringSplit conversion to correctly specify that it returns an array of strings rather than a single string. The issue was that the native function conversion was incorrectly declaring the return type as StringType when it should be ArrayType(StringType).

Changes:

  • Added ArrayType to imports and consolidated type imports
  • Added defensive .toString calls when accessing Literal.value
  • Fixed return type from StringType to ArrayType(StringType) for the native StringSplit function

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@richox richox merged commit ac1774f into apache:master Jan 17, 2026
104 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix StringSplit conversion failure

2 participants