[AutoSparkUT] Migrate DataFrameNaFunctionsSuite tests to RAPIDS #13777

wjxiz1992 · 2025-11-13T04:08:02Z

Description

This PR migrates DataFrameNaFunctionsSuite tests from Apache Spark to RAPIDS for GPU execution testing.

Changes

Created RapidsDataFrameNaFunctionsSuite extending DataFrameNaFunctionsSuite with RapidsSQLTestsBaseTrait
Registered test suite in RapidsTestSettings.scala with no exclusions

Test Results

Total tests: 27
Passing: 27 (100%)
Excluded: 0
Perfect GPU compatibility: 100% pass rate ✨

Test Coverage

Drop Operations (5 tests)

✅ drop() - Drop rows with null values in specified columns
✅ drop() with how - Drop using "any" or "all" strategy
✅ drop() with threshold - Drop when minimum non-null count not met
✅ drop() with col(*) - Error handling for wildcard columns
✅ drop() with nested columns - Drop based on nested struct fields

Fill Operations (8 tests)

✅ fill() numeric - Fill null values with numeric defaults
✅ fill() string - Fill null values with string defaults
✅ fill() boolean - Fill null values with boolean defaults
✅ fill() with map - Fill multiple columns with different values
✅ fill() with subset columns - Fill only specified columns
✅ fill() with col(*) - Wildcard column handling
✅ fill() with nested columns - Nested column fill behavior
✅ fillMap() with dotted names - SPARK-34417: Column names containing dots

Replace Operations (8 tests)

✅ replace() numeric - Replace specific numeric values
✅ replace() with null - Replace values with null
✅ replace() NaN with float - Replace NaN values in float columns
✅ replace() NaN with double - Replace NaN values in double columns
✅ replace() float with NaN - Replace float values with NaN
✅ replace() double with NaN - Replace double values with NaN
✅ replace() with dotted names - SPARK-34649: Column names containing dots
✅ replace() nested columns - Expected UnsupportedOperationException

Ambiguity Handling (2 tests)

✅ fill() ambiguous columns - AnalysisException for ambiguous columns after join
✅ drop() ambiguous columns - AnalysisException for ambiguous columns after join

Duplicate Columns (2 tests)

✅ SPARK-29890 - fill() with duplicate column names (no column specification)
✅ SPARK-30065 - drop() with duplicate column names (no column specification)

Edge Cases (2 tests)

✅ Column name validation - Error for non-existent columns
✅ Nested column operations - Proper exception handling

Key Observations

✅ 100% pass rate - Perfect GPU compatibility
✅ No modifications needed - All tests pass as-is
✅ Complete NA handling - All null/NaN operations work correctly
✅ Join operations - Ambiguity detection works on GPU
✅ Dotted column names - SPARK-34417 & SPARK-34649 support
✅ NaN handling - Float and double NaN replacement works correctly
✅ Type casting - Automatic type casting in fill() operations
✅ Error handling - All expected exceptions properly thrown

Files Changed

tests/src/test/spark330/scala/org/apache/spark/sql/rapids/suites/RapidsDataFrameNaFunctionsSuite.scala (new)
tests/src/test/spark330/scala/org/apache/spark/sql/rapids/utils/RapidsTestSettings.scala (updated)

Related Issues

Closes part of #11297 (Spark UT migration tracking issue)

Signed-off-by: Allen Xu [email protected]

Split the single long import statement into individual import lines for better readability and easier git diff tracking. Each RapidsSuite is now imported on a separate line, making it clearer when new test suites are added or removed. The existing 'scalastyle:off line.size.limit' comment ensures no style check violations. Signed-off-by: Allen Xu <[email protected]>

Add RapidsDataFrameNaFunctionsSuite extending DataFrameNaFunctionsSuite with RapidsSQLTestsBaseTrait for GPU execution testing. Test Results: - Total tests: 27 - Passing: 27 (100%) - Excluded: 0 - Perfect GPU compatibility: 100% pass rate Test Coverage: - Drop operations (5 tests) - drop() with column names - drop() with how parameter (any/all) - drop() with threshold - drop() with col(*) - drop() with nested columns - Fill operations (8 tests) - fill() with numeric values - fill() with string values - fill() with boolean values - fill() with map of values - fill() with subset columns - fill() with col(*) - fill() with nested columns - fillMap() with dotted column names - Replace operations (8 tests) - replace() numeric values - replace() with null values - replace() with NaN values (float/double) - replace() with dotted column names - replace() with qualified column names - replace() with nested columns (expected exception) - Ambiguity handling (2 tests) - fill() with ambiguous columns after join - drop() with ambiguous columns after join - Duplicate columns (2 tests) - SPARK-29890: fill() with duplicate column names - SPARK-30065: drop() with duplicate column names - Edge cases (2 tests) - Column name validation - Nested column operations All DataFrame NA functions (null/NaN handling) work correctly on GPU with perfect compatibility. Related: NVIDIA#11297 Signed-off-by: Allen Xu <[email protected]>

greptile-apps · 2025-11-13T04:10:17Z

Greptile Overview

Greptile Summary

This PR migrates DataFrameNaFunctionsSuite from Apache Spark to RAPIDS for GPU execution testing, achieving a perfect 100% pass rate across all 27 tests covering null/NaN handling operations (drop(), fill(), replace()).

Key changes:

Created minimal RapidsDataFrameNaFunctionsSuite extending the parent suite with RapidsSQLTestsTrait
Registered suite in RapidsTestSettings.scala with no test exclusions
All 27 tests pass without modification, demonstrating complete GPU compatibility for DataFrame NA operations

The implementation follows the established pattern used by other migrated test suites in the codebase.

Confidence Score: 5/5

Safe to merge - straightforward test migration with perfect pass rate
Clean implementation following established patterns, 100% test pass rate, minimal code changes with proper imports and trait usage
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
tests/src/test/spark330/scala/org/apache/spark/sql/rapids/suites/RapidsDataFrameNaFunctionsSuite.scala	5/5	New test suite that extends `DataFrameNaFunctionsSuite` with `RapidsSQLTestsTrait` - minimal implementation with no custom tests
tests/src/test/spark330/scala/org/apache/spark/sql/rapids/utils/RapidsTestSettings.scala	5/5	Registered `RapidsDataFrameNaFunctionsSuite` with no test exclusions, removed extra blank line

Sequence Diagram

sequenceDiagram
    participant Developer
    participant RapidsDataFrameNaFunctionsSuite
    participant DataFrameNaFunctionsSuite
    participant RapidsSQLTestsTrait
    participant GPU

    Developer->>RapidsDataFrameNaFunctionsSuite: Run test suite
    RapidsDataFrameNaFunctionsSuite->>DataFrameNaFunctionsSuite: Inherit 27 tests
    RapidsDataFrameNaFunctionsSuite->>RapidsSQLTestsTrait: Mix in GPU testing infrastructure
    RapidsSQLTestsTrait->>GPU: Configure Spark session for GPU execution
    GPU-->>RapidsSQLTestsTrait: GPU plugin enabled
    DataFrameNaFunctionsSuite->>GPU: Execute drop(), fill(), replace() tests
    GPU-->>DataFrameNaFunctionsSuite: All 27 tests pass
    RapidsDataFrameNaFunctionsSuite-->>Developer: 100% pass rate

greptile-apps

_{2 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps

_{2 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

...test/spark330/scala/org/apache/spark/sql/rapids/suites/RapidsDataFrameNaFunctionsSuite.scala

- Change from RapidsSQLTestsBaseTrait to RapidsSQLTestsTrait - Original Spark test extends QueryTest which requires RapidsSQLTestsTrait - RapidsSQLTestsTrait provides QueryTest functionality (checkAnswer, etc.) - All 27 tests pass after fix - Same issue as RapidsDataFrameComplexTypeSuite

wjxiz1992 added 2 commits November 12, 2025 16:44

greptile-apps bot reviewed Nov 13, 2025

View reviewed changes

Merge branch 'main' into dataframe-na-functions-suite-migration

25698c7

greptile-apps bot reviewed Nov 13, 2025

View reviewed changes

...test/spark330/scala/org/apache/spark/sql/rapids/suites/RapidsDataFrameNaFunctionsSuite.scala Outdated Show resolved Hide resolved

thirtiseven approved these changes Nov 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AutoSparkUT] Migrate DataFrameNaFunctionsSuite tests to RAPIDS #13777

[AutoSparkUT] Migrate DataFrameNaFunctionsSuite tests to RAPIDS #13777

wjxiz1992 commented Nov 13, 2025

Uh oh!

greptile-apps bot commented Nov 13, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[AutoSparkUT] Migrate DataFrameNaFunctionsSuite tests to RAPIDS #13777

Are you sure you want to change the base?

[AutoSparkUT] Migrate DataFrameNaFunctionsSuite tests to RAPIDS #13777

Conversation

wjxiz1992 commented Nov 13, 2025

Description

Changes

Test Results

Test Coverage

Drop Operations (5 tests)

Fill Operations (8 tests)

Replace Operations (8 tests)

Ambiguity Handling (2 tests)

Duplicate Columns (2 tests)

Edge Cases (2 tests)

Key Observations

Files Changed

Related Issues

Uh oh!

greptile-apps bot commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps bot commented Nov 13, 2025 •

edited

Loading