Releases: AmadeusITGroup/dataio-framework
v1.0.1-spark3.5.0
This release improves the flexibility and clarity of processor instantiation during tests:
- Refactored the createProcessor[T] method as a type-safe generic method:
createProcessor[T <: Processor](PipelineConfig): T
v1.0.0-spark3.5.0
We're happy to release v1.0.0 with support for Spark 3.5.0.
It introduces major (breaking) changes to configuration, better modularization, a brand new testing utility library, and prepares the ground for cleaner, more maintainable pipelines.
Highlights
Modular Architecture
Data I/O is now split into dedicated modules:
- dataio-core
- dataio-test
- dataio-kafka
- dataio-snowflake
This makes the framework easier to use, maintain, and extend.
Configuration Overhaul
- Switched from UpperCcamelCase to snake_case in configuration files
- Removed redundant fields from pipe definitions — now use the standard options {} block Spark options based configuration (e.g. Kafka bootstrap servers)
- Enforced the presence of the name field in all pipes for better traceability
- Renamed fields to match Spark functions conventions (e.g. repartition.exprs instead of repartition.columns)
New Testing Utilities
A new dataio-test module introduces:
- In-memory input/output via TestInput, TestOutput, and TestDataStore
- High-level test helpers with config-driven validation (assertProcessorResult, etc.)
- Cleaner patterns for unit, functional, and end-to-end tests
Examples are available in the documentation.
Old generic Spark/Scala testing utilities were removed to keep Data I/O focused and lean.
Streaming Improvements
- Added
AvailableNowandContinuoustrigger types for streaming pipes
Removed (for now)
- Elasticsearch support was dropped due to lack of compatibility with Spark 3.5.x
We recommend migrating your configs and test logic to align with the new structure and format. Everything is designed to offer better consistency, usability, and extensibility going forward.