-
Notifications
You must be signed in to change notification settings - Fork 39
Very basic reading and writing of a raster for InDB #406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces foundational support for reading and writing InDB raster data by integrating GDAL into the Sedona-DB project architecture. It provides basic functionality for converting raster files (particularly GeoTIFFs) to and from Sedona's internal raster format.
Key changes:
- Added GDAL dependencies and build system integration
- Implemented raster reading and writing functions for InDB storage
- Created utility functions for data type conversion between GDAL and Sedona formats
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| rust/sedona-raster/src/utils.rs | Added utility functions for byte conversion and data type sizing |
| rust/sedona-raster/src/lib.rs | Exposed the new utils module |
| rust/sedona-gdal/src/raster_io.rs | Implemented core raster I/O functionality with GDAL integration |
| rust/sedona-gdal/src/lib.rs | Created library entry point for GDAL module |
| rust/sedona-gdal/src/dataset.rs | Added dataset metadata extraction utilities |
| rust/sedona-gdal/Cargo.toml | Configured dependencies for GDAL integration |
| Cargo.toml | Added GDAL workspace dependencies and sedona-gdal member |
| .github/workflows/rust.yml | Updated CI to install GDAL development libraries |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| BandDataType::UInt16 => { | ||
| let mut data = Vec::with_capacity(pixel_count); | ||
| for chunk in band_data.chunks_exact(2) { | ||
| data.push(u16::from_ne_bytes([chunk[0], chunk[1]])); |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The byte-to-type conversion pattern is duplicated across multiple data types (lines 456, 464, 472, 480, 488, 496). Consider extracting this into a helper function to reduce duplication and improve maintainability.
| } | ||
| match data_type { | ||
| BandDataType::UInt8 => Some(bytes[0] as f64), | ||
| BandDataType::UInt16 => Some(u16::from_ne_bytes([bytes[0], bytes[1]]) as f64), |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bytes-to-type conversion pattern in bytes_to_f64 duplicates logic from write_band_data. Consider extracting a common conversion function to reduce duplication.
| // specific language governing permissions and limitations | ||
| // under the License. | ||
|
|
||
| use arrow_schema::ArrowError; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The errors in this module are not from Arrow. Is ExecutionError or ExternalError a better alternative?
| /// Helper function to read band data from GDAL directly into a pre-allocated byte buffer | ||
| /// Casts the buffer to the appropriate type and reads directly onto `output` | ||
| fn read_band_data_into( | ||
| band: &RasterBand, | ||
| window_origin: (isize, isize), | ||
| window_size: (usize, usize), | ||
| data_type: &BandDataType, | ||
| output: &mut [u8], | ||
| ) -> Result<(), ArrowError> { | ||
| let pixel_count = window_size.0 * window_size.1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to check the size of output before proceeding, otherwise we should mark this function as unsafe and document that the caller should pass in large enough output buffer.
| /// Reads a raster file using GDAL and converts it into a StructArray of rasters. | ||
| /// | ||
| /// Currently only supports reading rasters into InDb storage type. | ||
| /// OutDb storage types are not yet implemented. | ||
| /// | ||
| /// # Arguments | ||
| /// * `filepath` - Path to the raster file | ||
| /// * `tile_size_opt` - Optional tile size to override dataset metadata (uses dataset metadata if None) | ||
| pub fn read_raster( | ||
| filepath: &str, | ||
| tile_size_opt: Option<(usize, usize)>, | ||
| ) -> Result<Arc<StructArray>, ArrowError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to implement an iterator style reader? We may not want to load all the tiles into Arrow arrays when the source GeoTIFF file is large. We can leave it as-is if this function is only intended to handle small GeoTIFF files.
| sedona-raster = { path = "../sedona-raster" } | ||
| sedona-schema = { path = "../sedona-schema" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| sedona-raster = { path = "../sedona-raster" } | |
| sedona-schema = { path = "../sedona-schema" } | |
| sedona-raster = { workspace = true } | |
| sedona-schema = { workspace = true } |
| [dev-dependencies] | ||
| approx = { workspace = true } | ||
| rstest = { workspace = true } | ||
| sedona-testing = { path = "../../rust/sedona-testing", features = ["criterion"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| sedona-testing = { path = "../../rust/sedona-testing", features = ["criterion"] } | |
| sedona-testing = { workspace = true, features = ["criterion"] } |
This PR introduces foundational support for reading and writing InDB raster data in Sedona-DB, providing the groundwork for future enhancements and expanded functionality.
Key points:
I used this code to generate the 3-band noise raster .tif in #300.
Further iterations and feature expansion are planned to improve usability and extend raster data handling capabilities.