Skip to content

Extend Wayang with spatial operators#699

Open
MaxSpeer wants to merge 4 commits intoapache:mainfrom
Spatial-Data-MP:main
Open

Extend Wayang with spatial operators#699
MaxSpeer wants to merge 4 commits intoapache:mainfrom
Spatial-Data-MP:main

Conversation

@MaxSpeer
Copy link

Hi everyone,
together with @zkaoudi we worked on extending wayang with spatial operations.
Please consider our PR. We are looking forward to your feedback.

Summary

This PR introduces spatial processing capabilities in Wayang by adding two new logical operators: SpatialFilter and SpatialJoin, with platform-specific implementations for Java, Spark, and PostgreSQL/JDBC.
The goal is to enable spatial filter and join operations end-to-end across Wayang’s Java, Spark, and Postgres execution paths and remain extendable for other spatial operations.

What Changed

  • Added new logical operators in the common layer: SpatialFilterOperator and SpatialJoinOperator.
  • Added spatial abstractions in core: SpatialPredicate and SpatialGeometry.
  • Added a dedicated spatial plugin module: wayang-plugins/wayang-spatial.
  • Added mappings and operators for: Java, Spark, and PostgreSQL/JDBC.
  • Extended Java API with: spatialFilter(...) and spatialJoin(...).
  • Added GeoJsonFileSource support in the spatial plugin (logical + Java implementation).
  • Extended JDBC SQL planning/execution path to include spatial filter/join operators in SQL query construction.

Design Choices

  1. Core remains only defines spatial contracts (SpatialGeometry, SpatialPredicate) and has no direct JTS/Sedona/PostGIS dependency.
  2. Backend-specific dependencies are isolated in the spatial plugin JTS/Sedona/PostGIS specifics are implemented in wayang-spatial, keeping the API/core clean and modular.

Platform Implementation Notes

  • Java: Uses JTS-based predicate evaluation and spatial index support for joins.
  • Spark: Uses Apache Sedona for distributed spatial filtering and joining.
  • PostgreSQL/JDBC: Generates spatial SQL clauses, enabling DB-side execution of filter/join. (Limitation: JDBCExecutor does not support multiple sources at the moment)

Testing

  • Unit tests for: WayangGeometry, Java operators, Spark operators, JDBC SQL generation.
  • Java API tests for: spatial filter/join usage and chained operations.
  • Postgres integration tests are present but disabled by default due to external DB dependency.

---------
Co-authored-by: treyfel <58570309+treyfel@users.noreply.github.com>
Co-authored-by: maxspeer <82819900+MaxSpeer@users.noreply.github.com>
@mspruc
Copy link
Contributor

mspruc commented Feb 26, 2026

Hi thanks for your contribution,
I will have a deeper look at your PR later but in the meantime, can you clean up the various System.out.println(...) statements?

@MaxSpeer
Copy link
Author

MaxSpeer commented Mar 3, 2026

Hi thanks for your contribution, I will have a deeper look at your PR later but in the meantime, can you clean up the various System.out.println(...) statements?

Hi @mspruc, the print statements we introduced are only in benchmark classes and in tests. We think this is fine. Could you please point to the statements that should be removed?

import org.apache.wayang.core.plan.wayangplan.BinaryToUnaryOperator;
import org.apache.wayang.core.types.DataSetType;

public class SpatialJoinOperator<InputType0, InputType1> extends BinaryToUnaryOperator<InputType0, InputType1, Tuple2<InputType0, InputType1>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment about the operator similar to what you had done to spatialfilter operator

filterTasks.add(filterOperator);
} else if (nextTask.getOperator() instanceof JdbcProjectionOperator projectionOperator) {
final var operator = nextTask.getOperator();
if (operator instanceof JdbcFilterOperator || operator instanceof SpatialFilterOperator) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a jdbcspatialfilter operator? Why referencing the logical operator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe then we do not need the extra condition in the if?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JdbcSpatialOperator is located in the Spatial Plugin and we don't want to introduce this dependency. The findJdbcExecutionOperatorTaskInStage() function ensures that only Jdbc Operators are selected. For consistency we could change the other JDBC operators to their wayang operators (e.g. JdbcFilterOperator -> FilterOperator)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think having the Wayang operators is a good idea. In any case, we wouldn't have an execution operator from another platform here, so it's more about the type of operator and not about its execution platform.

So, if we say eg. FilterOperator, would that also catch the case for the SpatialFilterOperator? It should, right?

} else if (nextTask.getOperator() instanceof JdbcJoinOperator joinOperator) {
joinTasks.add(joinOperator);
projectionTask = (JdbcProjectionOperator) operator;
} else if (operator instanceof JdbcJoinOperator || (operator instanceof SpatialJoinOperator)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here? why a special case for the spatial wayang join operator is needed?


import static org.junit.jupiter.api.Assertions.*;

@Disabled("Requires local Postgres test database.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check if the hsqldb library we use for other jdbc tests supports spatial operators? Then we can have the tests without requiring a postgres instance setup and running.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hsqldb does not support spatial operations. In JdbcSpatialFilterOperatorTest we do however test if the generated Spatial SQL Query is correct.

@zkaoudi
Copy link
Contributor

zkaoudi commented Mar 15, 2026

And regarding the print outs, I agree with @mspruc. You can either remove them or you can homogenize them as some benchmarks print on the top and some others printout the count. But that's more of a minor.

@instant-sky
Copy link

We updated the spatial benchmarks with consistent print statements across the different classes. The benchmark scripts serve as examples on how to use the spatial operations and we use them to compare operator performance on different platforms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants