[Java SDK] Infer Beam logical types for JSR-310 and UUID fields#38194
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Java SDK's schema inference capabilities by introducing native support for modern Java date/time types (JSR-310) and UUIDs as Beam logical types. This change streamlines the process of converting Java objects (POJOs and JavaBeans) containing these types into Beam Rows and vice-versa, improving data interoperability and addressing a known issue related to schema assignability with external systems like Iceberg. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
afadcd2 to
a827a25
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements schema inference and ByteBuddy-based code generation support for JSR-310 date/time types (LocalDate, LocalTime, LocalDateTime, java.time.Instant) and UUIDs. It introduces static LogicalType instances for these types and updates the conversion logic to handle them during row-to-object and object-to-row transformations. A critical issue was identified where the new abstract method 'convertJavaTimeLogicalType' lacks implementations in 'SetterTypeConversion' and 'CreatorParameterTypeConversion', which will result in compilation errors.
| * Beam {@link LogicalType}s. Subclasses emit code that round-trips through the corresponding | ||
| * static {@link LogicalType} instance ({@link #JAVA_LOCAL_DATE_LOGICAL_TYPE} etc.). | ||
| */ | ||
| protected abstract T convertJavaTimeLogicalType(TypeDescriptor<?> type); |
There was a problem hiding this comment.
The addition of the abstract method convertJavaTimeLogicalType to the TypeConversion class requires an implementation in all of its subclasses to avoid compilation errors. While you have provided implementations for GetterTypeConversion, GetterPropertyConversion, and SetterPropertyConversion, you appear to have missed SetterTypeConversion and CreatorParameterTypeConversion. Both of these subclasses also extend TypeConversion and must implement this new method.
For SetterTypeConversion and CreatorParameterTypeConversion, returning Object.class (similar to GetterTypeConversion) should be appropriate as the generated code handles the specific casting in the property conversion path.
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
ahmedabu98
left a comment
There was a problem hiding this comment.
Overall LGTM, just left one comment
Would like @kennknowles to also skim through to check for anything I missed
| } else if (javaTimeLogicalTypeFor(typeDescriptor.getRawType()) != null) { | ||
| return convertJavaTimeLogicalType(typeDescriptor); |
There was a problem hiding this comment.
Wondering if we can expand the naming of this logic to include other LogicalTypes in the future? Instead of restricting it to just Time types. We already include UUID, might as well open it up to other LogicalTypes?
There was a problem hiding this comment.
Hey @sachinnn99, you mentioned renaming the new methods from time-type specific to use generic LogicalType naming. Wondering if you're still planning to or if you changed your mind
There was a problem hiding this comment.
Done — renamed to convertLogicalType / inferredLogicalTypeFor / loadLogicalType in the latest push.
|
Thanks for the review @ahmedabu98! Two follow-ups: Naming: Agreed. Planning to rename the helper to CI — Avro regression: The "failure" status on CI hides a real issue I want to flag before you merge: Two ways to fix:
I'm leaning toward (2) since |
|
Thanks for the helpful breakdown @sachinnn99! I think either way works really. I personally prefer having a concrete But there's no wrong solution here, I'll leave it up to your judgement as the PR author. |
a827a25 to
0efc536
Compare
|
@ahmedabu98 Went with approach (1) — added |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #38194 +/- ##
=========================================
Coverage ? 57.65%
Complexity ? 13364
=========================================
Files ? 2604
Lines ? 266801
Branches ? 11014
=========================================
Hits ? 153820
Misses ? 107184
Partials ? 5797
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Assigning reviewers: R: @chamikaramj for label java. Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
kennknowles
left a comment
There was a problem hiding this comment.
LGTM. Thanks for adding test coverage!
|
@chamikaramj This has approvals from @ahmedabu98 and @kennknowles — could you merge when you get a chance? Happy to address anything if needed. |
Add schema inference for java.time.LocalDate, LocalTime, LocalDateTime, Instant, and UUID as Beam logical types (SqlTypes.DATE, TIME, DATETIME, NanosInstant, SqlTypes.UUID) in both POJO and JavaBean schemas. This enables Beam Rows produced from POJOs with JSR-310 fields to be schema-assignable to rows from external systems (e.g. IcebergIO) that use the same logical types, fixing the incompatibility reported in apache#37524. The Avro extension overrides the new convertLogicalType hook to preserve existing Joda bridging for java.time.Instant and LocalDate.
9b6ff99 to
3bb125a
Compare
|
Reminder, please take a look at this pr: @chamikaramj |
|
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment R: @ahmedabu98 for label java. Available commands:
|
|
@kennknowles @ahmedabu98 is this ready to merge? |
Summary
StaticSchemaInference.fieldFromType()now mapsjava.time.LocalDate,LocalTime,LocalDateTime,java.time.Instant, andUUIDto their corresponding Beam logical types (SqlTypes.DATE,TIME,DATETIME,NanosInstant,SqlTypes.UUID)ByteBuddyUtilsgainsconvertJavaTimeLogicalType()in both getter and setter codegen paths, callingLogicalType.toBaseType()/toInputType()so POJO/Bean ↔ Row round-trips work correctlyReadableInstantFixes #37524
Test plan
POJOUtilsTest— schema inference for JSR-310 types + Joda regression testJavaFieldSchemaTest— schema, toRow, fromRow, round-trip, nullable toRow/fromRowJavaBeanSchemaTest— schema + round-tripJavaBeanUtilsTest— schema inference for beansConvertTest—Convert.fromRows()with logical-type DateTime (reproducer for [Bug]: JavaFieldSchema does not support for Beam Logical Types #37524) + mixed logical/primitive schemaR: @ahmedabu98