Skip to content

Conversation

@jihoonson
Copy link
Collaborator

@jihoonson jihoonson commented Nov 13, 2025

Fixes #13779.

Description

GpuOptimizeWriteExchangeExec computes the actualNumPartitions based on the partition count of its input. The computed actualNumPartitions is used to dynamically optimize the partitioning. This logic is currently missing handling of the case when the input partition count is 0, which can cause the ArithmeticException error. This PR adds a proper handling of the case.

Checklists

  • This PR has added documentation for new or modified features or behaviors.
  • This PR has added new tests or modified existing tests to cover new code paths.
    (Please explain in the PR description how the new code paths are tested, such as names of the new/existing tests that cover them.)
  • Performance testing has been performed and its results are added in the PR description. Or, an issue has been filed with a link in the PR description.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Nov 13, 2025

Greptile Overview

Greptile Summary

This PR fixes a divide-by-zero ArithmeticException in GpuOptimizeWriteExchangeExec that occurred when Delta Lake optimized writes processed empty datasets (0 input partitions).

Changes:

  • Added zero-partition guard in actualNumPartitions calculation that returns 0 when childNumPartitions is 0, preventing division by zero
  • Moved childNumPartitions declaration before mapOutputStatisticsFuture to ensure proper lazy initialization order
  • Applied fix consistently to both delta-33x and delta-40x versions
  • Added comprehensive test that validates the fix by filtering out all rows before writing with optimized writes enabled

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The fix is straightforward, well-tested, and addresses a clear bug. The change adds proper null/zero handling without modifying any core logic. Both Delta Lake versions are updated consistently, and a new test validates the edge case.
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
delta-lake/common/src/main/delta-33x/scala/org/apache/spark/sql/delta/rapids/GpuOptimizeWriteExchangeExec.scala 5/5 Added zero-partition guard in actualNumPartitions calculation to prevent divide-by-zero error; moved childNumPartitions declaration for proper initialization order
delta-lake/common/src/main/delta-40x/scala/org/apache/spark/sql/delta/rapids/GpuOptimizeWriteExchangeExec.scala 5/5 Applied identical zero-partition guard fix for delta-40x version to maintain consistency across Delta Lake versions
integration_tests/src/main/python/delta_lake_write_test.py 5/5 Added test_delta_write_optimized_empty_output test that filters all rows before writing to validate the zero-partition handling

Sequence Diagram

sequenceDiagram
    participant Client as Write Operation
    participant GOWE as GpuOptimizeWriteExchangeExec
    participant RDD as inputRDD
    participant Shuffle as Shuffle/Partition Logic

    Client->>GOWE: executeColumnar()
    GOWE->>RDD: getNumPartitions()
    RDD-->>GOWE: childNumPartitions (could be 0)
    
    alt childNumPartitions == 0
        GOWE->>GOWE: actualNumPartitions = 0
        GOWE->>GOWE: mapOutputStatisticsFuture = Future.successful(null)
        GOWE->>Shuffle: Create shuffle with 0 partitions
        Shuffle-->>Client: Empty result (no divide-by-zero)
    else childNumPartitions > 0
        GOWE->>GOWE: Calculate actualNumPartitions<br/>(targetShuffleBlocks / childNumPartitions)
        GOWE->>Shuffle: submitMapStage(shuffleDependency)
        GOWE->>Shuffle: rebalancePartitions(stats)
        Shuffle-->>Client: Optimized partitioned result
    end
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@sameerz sameerz added the bug Something isn't working label Nov 13, 2025
@sameerz
Copy link
Collaborator

sameerz commented Nov 13, 2025

build


@transient lazy val inputRDD: RDD[ColumnarBatch] = child.executeColumnar()

private lazy val childNumPartitions = inputRDD.getNumPartitions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is potential source of inconsistency to serialize an attribute or computation where the underlying input might change after deserialization.

Suggested change
private lazy val childNumPartitions = inputRDD.getNumPartitions
@transient private lazy val childNumPartitions = inputRDD.getNumPartitions

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. How can this code be executed before serialization? inputRDD should never be used outside the task.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be correct but looking at this code I would always have to doubt given that one thing is transient and the other thing depending on it is not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Divide-by-0 exception in Delta optimized write

5 participants