[SEDONA-750] add sedona deserialization code #460

Imbruced · 2025-12-15T22:33:07Z

Thanks to SedonaDB's great interoperability with Arrow, we could modify the Apache Spark vectorized UDFs to use SedonaDB for spatial attributes. We can start with scalar functions, and later add more complex features like table functions. One obstacle to that is converting the Apache Sedona internal serialization format to WKB, which SedonaDB uses. This ticker aims to add scala function to cast Sedona internal serde format to WKB to follow the following diagram

jiayuasu · 2025-12-16T18:43:34Z

@Imbruced Hey Pawel, do you mind adding a bit explanation on what this PR is about?

Imbruced · 2025-12-16T18:50:47Z

@jiayuasu, yeah, it's WIP MR in draft mode, so it's not ready yet. I am verifying what steps are tested for an MR in the Sedona DB.

Basically, the idea is to run the SedonaDB in Sedona vectorized UDFs

So do I need to transform the Sedona Serialized geometries to the WKB, which is input for SedonaDB

It would be better to keep it in SedonaDB other than Sedona (at least based on my thinking), but I am open to any suggestions. Another way would be to call the wkb functions on Sedona, but I am not sure if it's doable. I would like to have internal function, which is not exposed for the users, it's only utilized by the vectorized udf worker.

paleolimbot · 2025-12-17T02:54:30Z

Very cool!

I'll let you finish the proof-of-concept on the Sedona side and we can workshop the best way to connect all the dots here. I think working with the Sedona Spark serialization on the SedonaDB side is going to be the right way to go...we could potentially integrate it to the point where we don't need to convert to WKB (i.e., we can work with the Sedona Spark serialization in place for some functions).

jiayuasu · 2025-12-17T05:58:41Z

@Kontinuation SedonaSpark internal serialization is very similar to WKB. Is there a way that we can avoid unnecessary SerDe in the vectorized UDF?

Imbruced · 2025-12-17T07:19:44Z

@Kontinuation SedonaSpark internal serialization is very similar to WKB. Is there a way that we can avoid unnecessary SerDe in the vectorized UDF?

Yeah, it's similar, but in a few places it's different, and that's what this MR does: shuffles bytes to change the Sedona SerDe to WKB. So, for instance, I avoid reading coordinates to numbers but push bytes to a new array. IF there is a simpler way, that would be great. The difference I've seen is in multipolygon, polygon, multilinestring, where the metadata bytes (number of geometries, rings etc.) are at the end, and in the WKB in multilinestring, each point has complete wkb information of a linestirng like byteorder and wkb type

Kontinuation · 2025-12-17T07:34:19Z

There is a way to make SedonaDB work directly with custom serialization formats without converting to WKB first. Below is a high-level non-exhaustive roadmap of the required changes.

We implement geo-traits and geo-traits-ext for the serialization format, this is similar to how Wkb implements geo-traits and geo-traits-ext. This allows generic geo algorithms to work directly with them without deserializing the buffer into intermediate formats.
Extend SedonaType to natively support that custom format.
Support something like Into<geos::Geometry> and From<geos::Geometry> for directly decoding to/encoding from geometry objects defined by third party libraries. This makes ST functions based on GEOS having less performance overhead.
There are some existing code assuming that the data format is WKB and directly work with them. We should refactor some of them to be generic code working with geo-traits values.

Imbruced · 2025-12-17T09:49:24Z

@Kontinuation This sounds good. What do you think about what @paleolimbot proposed? As PoC, we go with the Sedona serde to the WKB transformation, and then we add the additional native serialization method? I can handle all of this, I'll need some time as I am not super familiar with SedonaDB and Rust.

Kontinuation · 2025-12-17T10:16:37Z

@Kontinuation This sounds good. What do you think about what @paleolimbot proposed? As PoC, we go with the Sedona serde to the WKB transformation, and then we add the additional native serialization method? I can handle all of this, I'll need some time as I am not super familiar with SedonaDB and Rust.

Yes. The PoC plan sounds good to me. This allows us building something useful without performing a giant refactoring.

Imbruced · 2025-12-19T23:57:04Z

Hmm not sure why the step is failing :/ I included the serialization, using my messy code in the Apache Sedona Spark.

I can run the following udf using Apache Sedona which runs SedonaDB

import pyarrow as pa
import shapely
import geoarrow.pyarrow as ga
from sedonadb import udf

@udf.arrow_udf(ga.wkb(), [udf.GEOMETRY, udf.NUMERIC])
def shapely_udf(geom, distance):
    geom_wkb = pa.array(geom.storage.to_array())
    distance = pa.array(distance.to_array())
    geom = shapely.from_wkb(geom_wkb)
    result_shapely = shapely.buffer(geom, distance)

    return pa.array(shapely.to_wkb(result_shapely))

which gives me the result as follows for my testing data

+--------------------+
|                geom|
+--------------------+
|POLYGON ((14.3093...|
|POLYGON ((14.3177...|
|POLYGON ((14.3891...|
|POLYGON ((14.2185...|
|POLYGON ((14.3595...|
|POLYGON ((14.3855...|
|POLYGON ((14.2739...|
|POLYGON ((14.4047...|
|POLYGON ((14.3120...|
|POLYGON ((14.3630...|
+--------------------+

I think the current MR is ready and I'll continue working on the worker and Apache Sedona, SedonaDB strategy

let me know what do you think

paleolimbot · 2025-12-21T03:01:05Z

Cool! When you have a PR on the Sedona Spark side feel free to link it and I will have a look.

Hmm not sure why the step is failing

R on MacOS CI is failing pretty much everywhere on GitHub actions at the moment (not related to this PR).

Imbruced · 2025-12-21T22:34:18Z

@paleolimbot definitely! Most likely closer to the end of the year, as now it's Christmas week :D

Imbruced force-pushed the add-sedona-deserializer branch from 493e070 to d36bf02 Compare December 16, 2025 16:24

Imbruced force-pushed the add-sedona-deserializer branch from 409bf54 to 82fa010 Compare December 16, 2025 21:22

Imbruced changed the title ~~add sedona deserialization code~~ [SEDONA-750] add sedona deserialization code Dec 16, 2025

Imbruced force-pushed the add-sedona-deserializer branch from d00a875 to 7179aa7 Compare December 17, 2025 18:06

add sedona SerDe

026413c

Imbruced force-pushed the add-sedona-deserializer branch from c46b6bf to 026413c Compare December 19, 2025 21:54

adjust

34362a1

Imbruced marked this pull request as ready for review December 19, 2025 23:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SEDONA-750] add sedona deserialization code #460

[SEDONA-750] add sedona deserialization code #460

Uh oh!

Imbruced commented Dec 15, 2025 •

edited

Loading

Uh oh!

jiayuasu commented Dec 16, 2025

Uh oh!

Imbruced commented Dec 16, 2025

Uh oh!

paleolimbot commented Dec 17, 2025

Uh oh!

jiayuasu commented Dec 17, 2025

Uh oh!

Imbruced commented Dec 17, 2025

Uh oh!

Kontinuation commented Dec 17, 2025

Uh oh!

Imbruced commented Dec 17, 2025

Uh oh!

Kontinuation commented Dec 17, 2025

Uh oh!

Imbruced commented Dec 19, 2025

Uh oh!

paleolimbot commented Dec 21, 2025

Uh oh!

Imbruced commented Dec 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SEDONA-750] add sedona deserialization code #460

Are you sure you want to change the base?

[SEDONA-750] add sedona deserialization code #460

Uh oh!

Conversation

Imbruced commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiayuasu commented Dec 16, 2025

Uh oh!

Imbruced commented Dec 16, 2025

Uh oh!

paleolimbot commented Dec 17, 2025

Uh oh!

jiayuasu commented Dec 17, 2025

Uh oh!

Imbruced commented Dec 17, 2025

Uh oh!

Kontinuation commented Dec 17, 2025

Uh oh!

Imbruced commented Dec 17, 2025

Uh oh!

Kontinuation commented Dec 17, 2025

Uh oh!

Imbruced commented Dec 19, 2025

Uh oh!

paleolimbot commented Dec 21, 2025

Uh oh!

Imbruced commented Dec 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Imbruced commented Dec 15, 2025 •

edited

Loading