-
Notifications
You must be signed in to change notification settings - Fork 1.4k
feat: Add iceberg partition transform evaluator and utility #15440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Add iceberg partition transform evaluator and utility #15440
Conversation
✅ Deploy Preview for meta-velox canceled.
|
|
@mbasmanova This PR continues from #15035 (review). |
|
Ci is red |
|
|
||
| /// Converts a partition value to its string representation for use in | ||
| /// partition directory path. The format follows the Iceberg specification | ||
| /// for partition path encoding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a link to the spec? Would be nice to add here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment.
I cannot find an explicit document about this. I will revise the comment.
I followed the Iceberg java code to implement this.
| explicit IcebergPartitionPath(TransformType transformType) | ||
| : transformType_(transformType) {} | ||
|
|
||
| ~IcebergPartitionPath() override = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
|
|
||
| using HivePartitionUtil::toPartitionString; | ||
|
|
||
| std::string toPartitionString(int32_t value, const TypePtr& type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps, document the format of the result
| /// Converts a partition value to its string representation for use in | ||
| /// partition directory path. The format follows the Iceberg specification | ||
| /// for partition path encoding. | ||
| class IcebergPartitionPath : public HivePartitionUtil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to extract this class into a separate PR and add a test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for reviewing.
I will separate this to a new PR.
| /// type, and optional parameter (e.g., bucket count, truncate width). | ||
| /// @param inputFieldName Name of the source column in the input RowVector. | ||
| /// @return Typed expression representing the transform. | ||
| static core::TypedExprPtr toExpression( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to expose this API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I moved it to anonymous namespace.
|
|
||
| namespace { | ||
|
|
||
| constexpr char const* kIcebergFunctionPrefix{"iceberg_"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this next to its only use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
| class TransformTest : public test::IcebergTestBase { | ||
| protected: | ||
| template <typename T> | ||
| VectorPtr createFlatVector( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use VectorMaker or helper methods from VectorTestBase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Changed to VectorMaker
|
|
||
| // Create expected vector and verify results. | ||
| auto expectedVector = createFlatVector<OUT>(expectedValues); | ||
| ASSERT_EQ(resultVector[0]->size(), expectedValues.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use assertEqualVectors or similar from VectorTestBase.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
ff2e9e0 to
ae18463
Compare
| case TransformType::kTruncate: | ||
| return type; | ||
| } | ||
| VELOX_UNREACHABLE("Unknown transform type"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbasmanova Have to add this to fix the building break on Linux. Macos is fine without this change.
| const std::vector<std::optional<IN>>& inputValues, | ||
| const std::vector<std::optional<OUT>>& expectedValues, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we replace these arguments with RowVectorPtr and use makeFlatVector and similar in the callers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Just updated.
ae18463 to
d007247
Compare
| protected: | ||
| void testTransform( | ||
| const IcebergPartitionSpec::Field& field, | ||
| const RowVectorPtr& inputRowVector, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: inputRowVector -> input, expectedRowVector -> expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
| auto transformExprs = TransformExprBuilder::toExpressions( | ||
| spec, | ||
| std::vector<column_index_t>{0}, | ||
| asRowType(inputRowVector->type())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
input->rowType()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
| const RowVectorPtr& inputRowVector, | ||
| const RowVectorPtr& expectedRowVector) { | ||
| // Build and evaluate transform expressions. | ||
| auto spec = std::make_shared<const IcebergPartitionSpec>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we test single field specs only? Why not test multi-field specs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the comment.
I added those test one by one when I implement the transformation logic one by one, and type by type.
I will refactor it to support test multiple transforms.
| {makeFlatVector<StringView>( | ||
| {StringView("\x01\x02\x03", 3), StringView("", 0)}, | ||
| VARBINARY())}), | ||
| makeRowVector({makeFlatVector<StringView>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drop StringView or use std::string
makeFlatVector<std::string>({"foo", "bar"})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
d007247 to
8047d9b
Compare
Introduces infrastructure for evaluating Iceberg partition transforms in Velox.
Added
TransformExprBuilderwhich converts Iceberg partition specifications into Velox expressionsAdded
TransformEvaluatorwhich evaluates multiple transform expressions in a single pass using compiled ExprSet. It automatically register Iceberg function in first call.Added
IcebergPartitionPathwhich converts partition keys to their string representation for directory paths following Iceberg specification (handles dates, timestamps, binary data encoding)Added test cases to cover the transform evaluation result.
Update iceberg CMake file to link Iceberg functions library.