Warn when HiveViewExpander's expanded body row type disagrees with caller's#607
Open
simbadzina wants to merge 3 commits into
Open
Warn when HiveViewExpander's expanded body row type disagrees with caller's#607simbadzina wants to merge 3 commits into
simbadzina wants to merge 3 commits into
Conversation
6d5714a to
55e0f7a
Compare
HiveViewExpander.expandView() previously discarded the rowType argument and returned whatever the re-parsed view body produced. When the caller-recorded row type and the expanded RelNode's row type differed in field order, downstream consumers that rely on positional access could land on the wrong RelNode-output position. Prefix-named sibling columns (one name a prefix of another) are particularly prone to swapping under such mismatches. This change wraps the expanded RelRoot in a LogicalProject that re-aligns fields to the caller's rowType by name (case-insensitive). The wrapper is a no-op when the orderings already match. If a caller-expected field is missing from the expanded body, or if arities differ, the method falls back to returning the original RelRoot unchanged. Adds: - Unit tests for alignToRowType covering aligned, case-only-difference, reorder, prefix-pair, missing-field, and arity-mismatch cases. - An integration test that drives the full expandView path with a deliberately reversed caller-provided rowType.
55e0f7a to
26a65b6
Compare
There was a problem hiding this comment.
Pull request overview
This PR fixes Hive view expansion so expanded view outputs can be reordered to match the caller-provided row type, preventing positional column mismatches in downstream view-on-view scenarios.
Changes:
- Adds
HiveViewExpander.alignToRowTypeto reorder expanded view fields by case-insensitive name when needed. - Adds unit coverage for no-op, reorder, prefix-sibling, missing-field, and arity fallback cases.
- Adds an integration fixture and test for the full
expandViewpath.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
coral-hive/src/main/java/com/linkedin/coral/hive/hive2rel/HiveViewExpander.java |
Adds row-type alignment after view query conversion. |
coral-hive/src/test/java/com/linkedin/coral/hive/hive2rel/HiveViewExpanderTest.java |
Adds unit tests for alignment behavior. |
coral-hive/src/test/java/com/linkedin/coral/hive/hive2rel/HiveToRelConverterTest.java |
Adds integration regression test for caller row-type ordering. |
coral-hive/src/test/java/com/linkedin/coral/hive/hive2rel/TestUtils.java |
Adds Hive test table/view fixtures used by the integration test. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
wmoustafa
previously requested changes
May 13, 2026
The upstream column-ordering condition that motivated this PR has been addressed at the view-producer layer (the upstream view's projection no longer triggers the case-sensitive resolver mismatch in HiveToRelConverter). With that upstream change in place, Coral no longer needs to silently rewrite the expanded view to re-align with the caller's rowType. This commit drops the LogicalProject re-alignment branch added in the previous commit and replaces it with a single warn-only diagnostic: - HiveViewExpander.expandView() now invokes warnIfRowTypeMisaligned and returns the original RelRoot unchanged. - warnIfRowTypeMisaligned logs a single warning when the expanded body's row type disagrees with the caller-provided rowType (either by field count or by case-insensitive field name order). No-op fast path when they agree. - A TODO marks the intent to tighten this from warn to an IllegalStateException once we are confident no live view trips it. Tests: - HiveViewExpanderTest rewritten to verify the warn-emission and no-rewrite contract directly via a Mockito-mocked slf4j Logger swapped in/out via reflection. - The integration test in HiveToRelConverterTest and its realign_view fixture in TestUtils are removed; they only made sense when the expander actively realigned the row type. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6fade9e to
38a8a9d
Compare
The top-level fieldNamesAlignedByOrder check missed the same class of silent positional-swap bug at deeper nesting levels: when a top-level field is itself a STRUCT (or an ARRAY/MULTISET of STRUCT) and its nested field order differs between caller and expanded body, a downstream consumer's positional access into the struct could land on the wrong source column. Adds a nestedRowTypesAlignedByOrder helper that recurses through: - STRUCT types: re-runs the case-insensitive field-name-by-order check on the nested row type. - ARRAY / MULTISET types: unwraps the component type and recurses, so ARRAY<STRUCT<...>> reorderings are caught one level past the array. Adds three unit tests covering: - Aligned nested struct (no warn). - Top-level field names agree but inner struct fields are reordered. - ARRAY-of-STRUCT element fields reordered. - Three-level nesting (STRUCT<STRUCT<STRUCT<...>>>) with the innermost struct fields reordered. Sanity-checked by stubbing nestedRowTypesAlignedByOrder to always return true: exactly the three nested-aware tests fail, the rest pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment on lines
+105
to
+120
| private static boolean nestedRowTypesAlignedByOrder(RelDataType a, RelDataType b) { | ||
| if (a.isStruct() != b.isStruct()) { | ||
| return false; | ||
| } | ||
| if (a.isStruct()) { | ||
| return fieldNamesAlignedByOrder(a, b); | ||
| } | ||
| final RelDataType ac = a.getComponentType(); | ||
| final RelDataType bc = b.getComponentType(); | ||
| if ((ac == null) != (bc == null)) { | ||
| return false; | ||
| } | ||
| if (ac != null) { | ||
| return nestedRowTypesAlignedByOrder(ac, bc); | ||
| } | ||
| return true; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
HiveViewExpander.expandView()previously discarded itsrowTypeargument and returned whatever the re-parsed view body produced. TheViewExpandercontract is that the returnedRelRoot's row type should match the caller-providedrowType; when those two disagree in field count or field order, downstream consumers that rely on positional access can silently read values from the wrong source column.Earlier revisions of this PR fixed the symptom inside Coral by wrapping the returned
RelRootin aLogicalProjectthat re-aligned fields to the caller-provided row type by name. The view producers that triggered the original mismatch have since been corrected upstream, so Coral no longer needs to silently rewrite the expanded view.What this PR does
Adds a warn-only diagnostic for the
ViewExpandercontract violation:HiveViewExpander.expandView()invokeswarnIfRowTypeMisaligned(root, rowType)and returns the originalRelRootunchanged.warnIfRowTypeMisalignedlogs a singleWARNwhen the expanded body's row type disagrees with the caller-provided row type. The check is recursive: it compares field count, then case-insensitive field names by order at every nesting level — STRUCT-within-STRUCT and ARRAY/MULTISET-of-STRUCT included — so a nested reorder one or more levels deep still surfaces. No-op fast path when the types already agree.TODOmarks the intent to harden this from a warning into anIllegalStateExceptiononce we are confident no live Hive view trips it. The currentWARNis a transitional safeguard, not the long-term design.flowchart LR E[warnIfRowTypeMisaligned<br/>expected rowType vs<br/>expanded body rowType] --> Q{Field counts match<br/>and names aligned by order<br/>at every nesting level?} Q -->|Yes| K[Return root unchanged] Q -->|No| W[LOG.warn expected vs actual<br/>return root unchanged]Why warn rather than auto-realign
The previous revision's
LogicalProjectrewrite was a correct response to the contract violation but was also a load-bearing rewrite of the user's query plan: it changed which physical columns produce each output position. With the upstream view producers corrected, that rewrite is no longer necessary, and an explicit warning is a less invasive signal — it surfaces the violation without silently mutating the plan. The path remains open to tighten this into a thrown exception (TODOin the source) once we have signal that no live view trips it.Test plan
HiveViewExpanderTestcovering top-level cases: aligned (no warn), case-only differences (no warn), reordered fields (one warn, root unchanged), arity mismatch (one warn, root unchanged), and a contract check that the warning path leaves theRelRootun-rewritten.STRUCT<...>(no warn), reordered fields inside a top-levelSTRUCT, reordered fields insideARRAY<STRUCT<...>>, and reordered fields in a three-levelSTRUCT<STRUCT<STRUCT<...>>>nesting.org.slf4j.Loggervia reflection in@BeforeMethodand restore the original in@AfterMethod, so they assert warn emission without depending on the test classpath's slf4j binding.LOG.warnline insidewarnIfRowTypeMisalignedcauses exactly the two top-level warn-asserting tests to fail; restoring it turns the suite green.nestedRowTypesAlignedByOrdertoreturn truecauses exactly the three nested-aware warn-asserting tests to fail; restoring it turns the suite green.:coral-hive:testand:coral-trino:testsuites pass.🤖 Generated with Claude Code