Skip to content

Conversation

@jesspav
Copy link
Collaborator

@jesspav jesspav commented Dec 2, 2025

This PR introduces foundational support for reading and writing InDB raster data in Sedona-DB, providing the groundwork for future enhancements and expanded functionality.

Key points:

  • Integrates GDAL into the project architecture, establishing a path for robust geospatial raster processing.
  • Adds initial reader and writer components for raster formats, making it possible to interact with real raster data for experimentation and development (though additional enhancements will be needed for full real-world use).

I used this code to generate the 3-band noise raster .tif in #300.

Further iterations and feature expansion are planned to improve usability and extend raster data handling capabilities.

@jesspav jesspav changed the title [WIP] Raster readwrite Very basic reading and writing of a raster Dec 4, 2025
@jesspav jesspav changed the title Very basic reading and writing of a raster Very basic reading and writing of a raster for InDB Dec 4, 2025
@jesspav jesspav marked this pull request as ready for review December 4, 2025 19:30
@jesspav jesspav requested a review from Copilot December 4, 2025 19:30
@jesspav jesspav marked this pull request as draft December 4, 2025 19:30
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces foundational support for reading and writing InDB raster data by integrating GDAL into the Sedona-DB project architecture. It provides basic functionality for converting raster files (particularly GeoTIFFs) to and from Sedona's internal raster format.

Key changes:

  • Added GDAL dependencies and build system integration
  • Implemented raster reading and writing functions for InDB storage
  • Created utility functions for data type conversion between GDAL and Sedona formats

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
rust/sedona-raster/src/utils.rs Added utility functions for byte conversion and data type sizing
rust/sedona-raster/src/lib.rs Exposed the new utils module
rust/sedona-gdal/src/raster_io.rs Implemented core raster I/O functionality with GDAL integration
rust/sedona-gdal/src/lib.rs Created library entry point for GDAL module
rust/sedona-gdal/src/dataset.rs Added dataset metadata extraction utilities
rust/sedona-gdal/Cargo.toml Configured dependencies for GDAL integration
Cargo.toml Added GDAL workspace dependencies and sedona-gdal member
.github/workflows/rust.yml Updated CI to install GDAL development libraries

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

BandDataType::UInt16 => {
let mut data = Vec::with_capacity(pixel_count);
for chunk in band_data.chunks_exact(2) {
data.push(u16::from_ne_bytes([chunk[0], chunk[1]]));
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The byte-to-type conversion pattern is duplicated across multiple data types (lines 456, 464, 472, 480, 488, 496). Consider extracting this into a helper function to reduce duplication and improve maintainability.

Copilot uses AI. Check for mistakes.
}
match data_type {
BandDataType::UInt8 => Some(bytes[0] as f64),
BandDataType::UInt16 => Some(u16::from_ne_bytes([bytes[0], bytes[1]]) as f64),
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bytes-to-type conversion pattern in bytes_to_f64 duplicates logic from write_band_data. Consider extracting a common conversion function to reduce duplication.

Copilot uses AI. Check for mistakes.
@jesspav jesspav marked this pull request as ready for review December 8, 2025 22:35
// specific language governing permissions and limitations
// under the License.

use arrow_schema::ArrowError;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The errors in this module are not from Arrow. Is ExecutionError or ExternalError a better alternative?

Comment on lines +193 to +202
/// Helper function to read band data from GDAL directly into a pre-allocated byte buffer
/// Casts the buffer to the appropriate type and reads directly onto `output`
fn read_band_data_into(
band: &RasterBand,
window_origin: (isize, isize),
window_size: (usize, usize),
data_type: &BandDataType,
output: &mut [u8],
) -> Result<(), ArrowError> {
let pixel_count = window_size.0 * window_size.1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check the size of output before proceeding, otherwise we should mark this function as unsafe and document that the caller should pass in large enough output buffer.

Comment on lines +74 to +85
/// Reads a raster file using GDAL and converts it into a StructArray of rasters.
///
/// Currently only supports reading rasters into InDb storage type.
/// OutDb storage types are not yet implemented.
///
/// # Arguments
/// * `filepath` - Path to the raster file
/// * `tile_size_opt` - Optional tile size to override dataset metadata (uses dataset metadata if None)
pub fn read_raster(
filepath: &str,
tile_size_opt: Option<(usize, usize)>,
) -> Result<Arc<StructArray>, ArrowError> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to implement an iterator style reader? We may not want to load all the tiles into Arrow arrays when the source GeoTIFF file is large. We can leave it as-is if this function is only intended to handle small GeoTIFF files.

Comment on lines +46 to +47
sedona-raster = { path = "../sedona-raster" }
sedona-schema = { path = "../sedona-schema" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sedona-raster = { path = "../sedona-raster" }
sedona-schema = { path = "../sedona-schema" }
sedona-raster = { workspace = true }
sedona-schema = { workspace = true }

[dev-dependencies]
approx = { workspace = true }
rstest = { workspace = true }
sedona-testing = { path = "../../rust/sedona-testing", features = ["criterion"] }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sedona-testing = { path = "../../rust/sedona-testing", features = ["criterion"] }
sedona-testing = { workspace = true, features = ["criterion"] }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants