-
Notifications
You must be signed in to change notification settings - Fork 1
Adding the image crop dataset implementation, with refactor of serialization logic to reduce complexity and redundancy #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…which internally serializes and deserializes the manifest, so for dataset to and from config the only source of truth would be the file_state attribute.
…implified serialization logic
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
40025ed to
31dd8c9
Compare
d770f89 to
37bcfb7
Compare
… refresh ci check marking Tuple import as missing
d33bs
approved these changes
Dec 16, 2025
…ual_stain_flow into dev-add-patch-dataset
…et constructors for clarity on file_index requirements and schema.
…asetManifest. Adjust tests and fixtures to move from monkey patching to using real ds_engine backends. - Added `input_validation.py` module to validate file index DataFrame structure and path-like entries. - Integrated input validation into `DatasetManifest` to ensure file index integrity during initialization. - Updated tests to create temporary TIFF files for realistic file I/O testing. - Refactored existing tests to utilize new validation and file handling mechanisms. - Adjusted crop dimensions in tests to ensure consistency with new validation logic.
…adding parameter descriptions and validation for missing or incorrect crop_collection keys.
…tions; Add tests for cropimage dataset.
… and functionality tests
…nhance error raising in manifest and crop manifest modules for better clarity and adjust tests accordingly.
…atures and removed obsolete dataset classes.
…ke_file_index_schema
…ing image dimensions
…from BaseImageDataset
…atasets and updated the plotting callback.
…mage preprocessing
- Replaced the `_load_single_channel` and `load_jump_bf_hoechst` functions with a new `build_file_index` function to streamline image loading and indexing. - Introduced `BaseImageDataset` for dataset management, allowing for easier handling of input-target pairs. - Added `CropImageDataset` to facilitate cropping of images during training. - Updated visualization code to reflect changes in dataset structure and ensure compatibility with new dataset classes. - Adjusted DataLoader to use the cropped dataset for training.
Collaborator
Author
|
Thanks @d33bs. Addressed all your comments that I can. Updated the example and testing as well. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This PR adds the new image dataset class that returns user-specified crops dynamically extracted from full-resolution image files. The backend engine was introduced in #21.
Adds:
src/virtual_stain_flow/datasets/crop_dataset.pythe new crop datasetRefactors:
This PR also includes a small refactor to simplify backend serialization logic, addressing design issues introduced in #11.
Why the Refactor?
The dataset is based on:
DatasetManifest(validation + image access), andFileStatelayer that handles caching and lazy loading.In #11, reproducibility was implemented at the
Datasetlevel by serializing the file index and manually reconstructingDatasetManifest→FileState. This tightly coupledDatasetwith internal parameters of those classes. AsDatasetManifestandFileStategained configuration options,Datasethad to know and serialize all of them, creating unnecessary complexity.This PR moves serialization and deserialization logic into
DatasetManifestandFileStatethemselves. The dataset now only manages its own parameters: