Skip to content

Conversation

@MrPowers
Copy link
Collaborator

This PR shows how to read PostGIS tables into SedonaDB DataFrames with SQLAlchemy.

Feel free to comment if there are other ways to do this. I assume reading PostGIS tables is possible via ADBC (Arrow Database Connectivity) as well.

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

If you want to add an ADBC version, I put together a demo. In theory this is much faster. The key is adbc_ingest() and fetch_arrow() to skip iterating over rows and/or the Pandas DataFrame.

import sedona.db
import adbc_driver_postgresql.dbapi

sd = sedona.db.connect()

conn = adbc_driver_postgresql.dbapi.connect(
    "postgresql://localhost:5432/postgres?user=postgres&password=password"
)

# SedonaDB DataFrame to PostGIS
with conn.cursor() as cur:
    # Create a SedonaDB DataFrame that represents a view of the data you'd like to import
    url = "https://github.com/geoarrow/geoarrow-data/releases/download/v0.2.0/ns-water_water-point_geo.parquet"
    sd.read_parquet(url).to_view("ns_water_point", overwrite=True)
    df = sd.sql("""SELECT "OBJECTID", ST_AsBinary(geometry) AS geometry FROM ns_water_point""")

    # Use adbc_ingest() to create a temporary table
    cur.adbc_ingest("ns_water_point_temp", df, temporary=True)

with conn.cursor() as cur:
    # Use a CREATE TABLE AS SELECT statement to recreate the geometry type
    cur.executescript("""CREATE TABLE ns_water_point AS (
        SELECT "OBJECTID", ST_GeomFromWKB(geometry) as geometry FROM ns_water_point_temp
    )""")

# PostGIS query to SedonaDB
with conn.cursor() as cur:
    # Use ST_AsBinary to export the geometry
    cur.execute("""SELECT "OBJECTID", ST_AsBinary(geometry) AS geom_wkb FROM ns_water_point""")

    # cur.fetch_arrow() and to_view() are lazy...no results have been pulled yet
    sd.create_data_frame(cur.fetch_arrow()).to_view("postgis_result", overwrite=True)

    # Use ST_GeomFromWKB() to import the geometry; ensure the result is collected
    # before the cursor is closed. This could also stream to a file (e.g., to_parquet())
    # instead of collecting in memory first.
    df = sd.sql("SELECT ST_GeomFromWKB(geom_wkb) as geometry FROM postgis_result").to_memtable()

df.head(5).show()
#> ┌──────────────────────────────────────────────────────────────────┐
#> │                             geometry                             │
#> │                             geometry                             │
#> ╞══════════════════════════════════════════════════════════════════╡
#> │ POINT Z(258976.3273 4820275.6807 -0.5)                           │
#> ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
#> │ POINT Z(258340.72730000038 4819923.080700001 0.6000000000058208) │
#> ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
#> │ POINT Z(258338.4263000004 4819908.080700001 0.5)                 │
#> ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
#> │ POINT Z(258526.62729999982 4819583.580700001 0)                  │
#> ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
#> │ POINT Z(258498.92729999963 4819652.080700001 1.8999999999941792) │
#> └──────────────────────────────────────────────────────────────────┘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants