-
Notifications
You must be signed in to change notification settings - Fork 262
[AutoSparkUT] Migrate DataFrameNaFunctionsSuite tests to RAPIDS #13777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[AutoSparkUT] Migrate DataFrameNaFunctionsSuite tests to RAPIDS #13777
Conversation
Split the single long import statement into individual import lines for better readability and easier git diff tracking. Each RapidsSuite is now imported on a separate line, making it clearer when new test suites are added or removed. The existing 'scalastyle:off line.size.limit' comment ensures no style check violations. Signed-off-by: Allen Xu <[email protected]>
Add RapidsDataFrameNaFunctionsSuite extending DataFrameNaFunctionsSuite with RapidsSQLTestsBaseTrait for GPU execution testing. Test Results: - Total tests: 27 - Passing: 27 (100%) - Excluded: 0 - Perfect GPU compatibility: 100% pass rate Test Coverage: - Drop operations (5 tests) - drop() with column names - drop() with how parameter (any/all) - drop() with threshold - drop() with col(*) - drop() with nested columns - Fill operations (8 tests) - fill() with numeric values - fill() with string values - fill() with boolean values - fill() with map of values - fill() with subset columns - fill() with col(*) - fill() with nested columns - fillMap() with dotted column names - Replace operations (8 tests) - replace() numeric values - replace() with null values - replace() with NaN values (float/double) - replace() with dotted column names - replace() with qualified column names - replace() with nested columns (expected exception) - Ambiguity handling (2 tests) - fill() with ambiguous columns after join - drop() with ambiguous columns after join - Duplicate columns (2 tests) - SPARK-29890: fill() with duplicate column names - SPARK-30065: drop() with duplicate column names - Edge cases (2 tests) - Column name validation - Nested column operations All DataFrame NA functions (null/NaN handling) work correctly on GPU with perfect compatibility. Related: NVIDIA#11297 Signed-off-by: Allen Xu <[email protected]>
Greptile OverviewGreptile SummaryThis PR migrates Key changes:
The implementation follows the established pattern used by other migrated test suites in the codebase. Confidence Score: 5/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant Developer
participant RapidsDataFrameNaFunctionsSuite
participant DataFrameNaFunctionsSuite
participant RapidsSQLTestsTrait
participant GPU
Developer->>RapidsDataFrameNaFunctionsSuite: Run test suite
RapidsDataFrameNaFunctionsSuite->>DataFrameNaFunctionsSuite: Inherit 27 tests
RapidsDataFrameNaFunctionsSuite->>RapidsSQLTestsTrait: Mix in GPU testing infrastructure
RapidsSQLTestsTrait->>GPU: Configure Spark session for GPU execution
GPU-->>RapidsSQLTestsTrait: GPU plugin enabled
DataFrameNaFunctionsSuite->>GPU: Execute drop(), fill(), replace() tests
GPU-->>DataFrameNaFunctionsSuite: All 27 tests pass
RapidsDataFrameNaFunctionsSuite-->>Developer: 100% pass rate
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 1 comment
...test/spark330/scala/org/apache/spark/sql/rapids/suites/RapidsDataFrameNaFunctionsSuite.scala
Outdated
Show resolved
Hide resolved
- Change from RapidsSQLTestsBaseTrait to RapidsSQLTestsTrait - Original Spark test extends QueryTest which requires RapidsSQLTestsTrait - RapidsSQLTestsTrait provides QueryTest functionality (checkAnswer, etc.) - All 27 tests pass after fix - Same issue as RapidsDataFrameComplexTypeSuite
Description
This PR migrates
DataFrameNaFunctionsSuitetests from Apache Spark to RAPIDS for GPU execution testing.Changes
RapidsDataFrameNaFunctionsSuiteextendingDataFrameNaFunctionsSuitewithRapidsSQLTestsBaseTraitRapidsTestSettings.scalawith no exclusionsTest Results
Test Coverage
Drop Operations (5 tests)
Fill Operations (8 tests)
Replace Operations (8 tests)
Ambiguity Handling (2 tests)
Duplicate Columns (2 tests)
Edge Cases (2 tests)
Key Observations
Files Changed
tests/src/test/spark330/scala/org/apache/spark/sql/rapids/suites/RapidsDataFrameNaFunctionsSuite.scala(new)tests/src/test/spark330/scala/org/apache/spark/sql/rapids/utils/RapidsTestSettings.scala(updated)Related Issues
Closes part of #11297 (Spark UT migration tracking issue)
Signed-off-by: Allen Xu [email protected]