Skip to content

Conversation

@gerashegalov
Copy link
Collaborator

Fixes #13718

Description

Previous assumption that pyspark[connect] is the smallest pip package for Connect is wrong.
It has a full SPARK_HOME including the jars.
Smoke test should be a pure Python app.

This PR proposes to use pyspark-client as in spark-rapids-examples

Checklists

  • This PR has added documentation for new or modified features or behaviors.
  • This PR has added new tests or modified existing tests to cover new code paths.
    (Please explain in the PR description how the new code paths are tested, such as names of the new/existing tests that cover them.)
  • Performance testing has been performed and its results are added in the PR description. Or, an issue has been filed with a link in the PR description.

NvTimLiu and others added 3 commits October 31, 2025 17:35
1, Retrieve commit hashes for a release after enabling the new branch
model.

2, Create a query to fetch pull request (PR) information from GitHub
using commit hashes.

3, Support retrieving PR ChangeLogs for both the old and new branch
models.

---------

Signed-off-by: timl <[email protected]>
@gerashegalov gerashegalov self-assigned this Nov 13, 2025
@gerashegalov gerashegalov added bug Something isn't working test Only impacts tests labels Nov 13, 2025
@gerashegalov gerashegalov requested a review from nartal1 November 13, 2025 15:39
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Nov 13, 2025

Greptile Overview

Greptile Summary

This PR fixes the Spark Connect smoke test by replacing pyspark[connect] with pyspark-client package. The previous package incorrectly included a full SPARK_HOME with jars, defeating the purpose of testing a pure Python client.

Changes:

  • Updated pip install command to use pyspark-client instead of pyspark[connect]
  • Added reference comment to official Spark documentation
  • Minor whitespace adjustment in the pip install redirect (added space before >)

The change is correct and aligns with Apache Spark's official documentation for Connect clients. The pyspark-client package is specifically designed as a lightweight, pure Python package without JVM dependencies, which is exactly what the smoke test requires.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The change is a straightforward package replacement that correctly addresses the reported issue. The new package pyspark-client is the official lightweight client for Spark Connect (as documented by Apache Spark), and all imports used in the test (pyspark.sql.SparkSession) are available in this package. The test logic remains unchanged, only the package installation command was updated. No breaking changes or compatibility issues are expected.
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
integration_tests/run_pyspark_from_build.sh 5/5 Replaced pyspark[connect] with pyspark-client package to ensure pure Python client without JVM dependencies

Sequence Diagram

sequenceDiagram
    participant Script as run_pyspark_from_build.sh
    participant Venv as Virtual Environment
    participant Pip as pip installer
    participant PyPI as PyPI Repository
    participant Client as Python Client
    participant Server as Spark Connect Server

    Script->>Script: Check Spark version >= 3.5.6
    Script->>Script: Start Spark Connect server
    Script->>Venv: Create virtual environment
    Script->>Venv: Upgrade pip
    Script->>Pip: Install pyspark-client==${VERSION_STRING}
    Pip->>PyPI: Download pyspark-client (pure Python)
    PyPI-->>Pip: Return lightweight package (no jars)
    Pip-->>Venv: Install package
    Script->>Client: Execute Python test script
    Client->>Client: Import pyspark.sql.SparkSession
    Client->>Server: Connect via CONNECT_URL
    Client->>Server: Execute spark.range(100)
    Server-->>Client: Return GPU-accelerated results
    Client->>Client: Verify SC_RESULT=4950
    Client->>Client: Verify GpuRange in plan
    Client-->>Script: Success
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

@gerashegalov
Copy link
Collaborator Author

build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working test Only impacts tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Spark Connect smoke test uses a pip package pyspark[connect] with jars

2 participants