-
Notifications
You must be signed in to change notification settings - Fork 262
Use pyspark-client for the Spark Connect smoke test #13778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
1, Retrieve commit hashes for a release after enabling the new branch model. 2, Create a query to fetch pull request (PR) information from GitHub using commit hashes. 3, Support retrieving PR ChangeLogs for both the old and new branch models. --------- Signed-off-by: timl <[email protected]>
Greptile OverviewGreptile SummaryThis PR fixes the Spark Connect smoke test by replacing Changes:
The change is correct and aligns with Apache Spark's official documentation for Connect clients. The Confidence Score: 5/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant Script as run_pyspark_from_build.sh
participant Venv as Virtual Environment
participant Pip as pip installer
participant PyPI as PyPI Repository
participant Client as Python Client
participant Server as Spark Connect Server
Script->>Script: Check Spark version >= 3.5.6
Script->>Script: Start Spark Connect server
Script->>Venv: Create virtual environment
Script->>Venv: Upgrade pip
Script->>Pip: Install pyspark-client==${VERSION_STRING}
Pip->>PyPI: Download pyspark-client (pure Python)
PyPI-->>Pip: Return lightweight package (no jars)
Pip-->>Venv: Install package
Script->>Client: Execute Python test script
Client->>Client: Import pyspark.sql.SparkSession
Client->>Server: Connect via CONNECT_URL
Client->>Server: Execute spark.range(100)
Server-->>Client: Return GPU-accelerated results
Client->>Client: Verify SC_RESULT=4950
Client->>Client: Verify GpuRange in plan
Client-->>Script: Success
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, no comments
|
build |
Fixes #13718
Description
Previous assumption that pyspark[connect] is the smallest pip package for Connect is wrong.
It has a full SPARK_HOME including the jars.
Smoke test should be a pure Python app.
This PR proposes to use pyspark-client as in spark-rapids-examples
Checklists
(Please explain in the PR description how the new code paths are tested, such as names of the new/existing tests that cover them.)