Skip to content

Warn when HiveViewExpander's expanded body row type disagrees with caller's#607

Open
simbadzina wants to merge 3 commits into
linkedin:masterfrom
simbadzina:fix-hiveviewexpander-rowtype-alignment
Open

Warn when HiveViewExpander's expanded body row type disagrees with caller's#607
simbadzina wants to merge 3 commits into
linkedin:masterfrom
simbadzina:fix-hiveviewexpander-rowtype-alignment

Conversation

@simbadzina
Copy link
Copy Markdown
Collaborator

@simbadzina simbadzina commented May 13, 2026

Summary

HiveViewExpander.expandView() previously discarded its rowType argument and returned whatever the re-parsed view body produced. The ViewExpander contract is that the returned RelRoot's row type should match the caller-provided rowType; when those two disagree in field count or field order, downstream consumers that rely on positional access can silently read values from the wrong source column.

Earlier revisions of this PR fixed the symptom inside Coral by wrapping the returned RelRoot in a LogicalProject that re-aligned fields to the caller-provided row type by name. The view producers that triggered the original mismatch have since been corrected upstream, so Coral no longer needs to silently rewrite the expanded view.

What this PR does

Adds a warn-only diagnostic for the ViewExpander contract violation:

  • HiveViewExpander.expandView() invokes warnIfRowTypeMisaligned(root, rowType) and returns the original RelRoot unchanged.
  • warnIfRowTypeMisaligned logs a single WARN when the expanded body's row type disagrees with the caller-provided row type. The check is recursive: it compares field count, then case-insensitive field names by order at every nesting level — STRUCT-within-STRUCT and ARRAY/MULTISET-of-STRUCT included — so a nested reorder one or more levels deep still surfaces. No-op fast path when the types already agree.
  • A TODO marks the intent to harden this from a warning into an IllegalStateException once we are confident no live Hive view trips it. The current WARN is a transitional safeguard, not the long-term design.
flowchart LR
  E[warnIfRowTypeMisaligned<br/>expected rowType vs<br/>expanded body rowType] --> Q{Field counts match<br/>and names aligned by order<br/>at every nesting level?}
  Q -->|Yes| K[Return root unchanged]
  Q -->|No| W[LOG.warn expected vs actual<br/>return root unchanged]
Loading

Why warn rather than auto-realign

The previous revision's LogicalProject rewrite was a correct response to the contract violation but was also a load-bearing rewrite of the user's query plan: it changed which physical columns produce each output position. With the upstream view producers corrected, that rewrite is no longer necessary, and an explicit warning is a less invasive signal — it surfaces the violation without silently mutating the plan. The path remains open to tighten this into a thrown exception (TODO in the source) once we have signal that no live view trips it.

Test plan

  • Unit tests in HiveViewExpanderTest covering top-level cases: aligned (no warn), case-only differences (no warn), reordered fields (one warn, root unchanged), arity mismatch (one warn, root unchanged), and a contract check that the warning path leaves the RelRoot un-rewritten.
  • Unit tests for nested-row recursion: aligned STRUCT<...> (no warn), reordered fields inside a top-level STRUCT, reordered fields inside ARRAY<STRUCT<...>>, and reordered fields in a three-level STRUCT<STRUCT<STRUCT<...>>> nesting.
  • Tests inject a Mockito-mocked org.slf4j.Logger via reflection in @BeforeMethod and restore the original in @AfterMethod, so they assert warn emission without depending on the test classpath's slf4j binding.
  • Sanity-check, top-level: removing the LOG.warn line inside warnIfRowTypeMisaligned causes exactly the two top-level warn-asserting tests to fail; restoring it turns the suite green.
  • Sanity-check, recursion: stubbing nestedRowTypesAlignedByOrder to return true causes exactly the three nested-aware warn-asserting tests to fail; restoring it turns the suite green.
  • Full :coral-hive:test and :coral-trino:test suites pass.

🤖 Generated with Claude Code

@simbadzina simbadzina marked this pull request as draft May 13, 2026 20:45
@simbadzina simbadzina force-pushed the fix-hiveviewexpander-rowtype-alignment branch 6 times, most recently from 6d5714a to 55e0f7a Compare May 13, 2026 21:46
@simbadzina simbadzina requested a review from Copilot May 13, 2026 21:46
HiveViewExpander.expandView() previously discarded the rowType argument and
returned whatever the re-parsed view body produced. When the caller-recorded
row type and the expanded RelNode's row type differed in field order,
downstream consumers that rely on positional access could land on the wrong
RelNode-output position. Prefix-named sibling columns (one name a prefix of
another) are particularly prone to swapping under such mismatches.

This change wraps the expanded RelRoot in a LogicalProject that re-aligns
fields to the caller's rowType by name (case-insensitive). The wrapper is a
no-op when the orderings already match. If a caller-expected field is missing
from the expanded body, or if arities differ, the method falls back to
returning the original RelRoot unchanged.

Adds:
- Unit tests for alignToRowType covering aligned, case-only-difference,
  reorder, prefix-pair, missing-field, and arity-mismatch cases.
- An integration test that drives the full expandView path with a
  deliberately reversed caller-provided rowType.
@simbadzina simbadzina force-pushed the fix-hiveviewexpander-rowtype-alignment branch from 55e0f7a to 26a65b6 Compare May 13, 2026 21:47
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Hive view expansion so expanded view outputs can be reordered to match the caller-provided row type, preventing positional column mismatches in downstream view-on-view scenarios.

Changes:

  • Adds HiveViewExpander.alignToRowType to reorder expanded view fields by case-insensitive name when needed.
  • Adds unit coverage for no-op, reorder, prefix-sibling, missing-field, and arity fallback cases.
  • Adds an integration fixture and test for the full expandView path.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
coral-hive/src/main/java/com/linkedin/coral/hive/hive2rel/HiveViewExpander.java Adds row-type alignment after view query conversion.
coral-hive/src/test/java/com/linkedin/coral/hive/hive2rel/HiveViewExpanderTest.java Adds unit tests for alignment behavior.
coral-hive/src/test/java/com/linkedin/coral/hive/hive2rel/HiveToRelConverterTest.java Adds integration regression test for caller row-type ordering.
coral-hive/src/test/java/com/linkedin/coral/hive/hive2rel/TestUtils.java Adds Hive test table/view fixtures used by the integration test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@simbadzina simbadzina marked this pull request as ready for review May 13, 2026 21:53
@simbadzina simbadzina marked this pull request as draft May 18, 2026 19:00
The upstream column-ordering condition that motivated this PR has been
addressed at the view-producer layer (the upstream view's projection no
longer triggers the case-sensitive resolver mismatch in
HiveToRelConverter). With that upstream change in place, Coral no longer
needs to silently rewrite the expanded view to re-align with the caller's
rowType.

This commit drops the LogicalProject re-alignment branch added in the
previous commit and replaces it with a single warn-only diagnostic:

- HiveViewExpander.expandView() now invokes warnIfRowTypeMisaligned and
  returns the original RelRoot unchanged.
- warnIfRowTypeMisaligned logs a single warning when the expanded body's
  row type disagrees with the caller-provided rowType (either by field
  count or by case-insensitive field name order). No-op fast path when
  they agree.
- A TODO marks the intent to tighten this from warn to an
  IllegalStateException once we are confident no live view trips it.

Tests:
- HiveViewExpanderTest rewritten to verify the warn-emission and
  no-rewrite contract directly via a Mockito-mocked slf4j Logger
  swapped in/out via reflection.
- The integration test in HiveToRelConverterTest and its realign_view
  fixture in TestUtils are removed; they only made sense when the
  expander actively realigned the row type.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@simbadzina simbadzina force-pushed the fix-hiveviewexpander-rowtype-alignment branch from 6fade9e to 38a8a9d Compare May 28, 2026 20:39
@simbadzina simbadzina changed the title Honor caller-provided rowType in HiveViewExpander.expandView Warn when HiveViewExpander's expanded body row type disagrees with caller's May 28, 2026
The top-level fieldNamesAlignedByOrder check missed the same class of
silent positional-swap bug at deeper nesting levels: when a top-level
field is itself a STRUCT (or an ARRAY/MULTISET of STRUCT) and its
nested field order differs between caller and expanded body, a
downstream consumer's positional access into the struct could land on
the wrong source column.

Adds a nestedRowTypesAlignedByOrder helper that recurses through:
- STRUCT types: re-runs the case-insensitive field-name-by-order check
  on the nested row type.
- ARRAY / MULTISET types: unwraps the component type and recurses, so
  ARRAY<STRUCT<...>> reorderings are caught one level past the array.

Adds three unit tests covering:
- Aligned nested struct (no warn).
- Top-level field names agree but inner struct fields are reordered.
- ARRAY-of-STRUCT element fields reordered.
- Three-level nesting (STRUCT<STRUCT<STRUCT<...>>>) with the innermost
  struct fields reordered.

Sanity-checked by stubbing nestedRowTypesAlignedByOrder to always
return true: exactly the three nested-aware tests fail, the rest pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@simbadzina simbadzina marked this pull request as ready for review May 28, 2026 21:15
@simbadzina simbadzina requested review from Copilot and wmoustafa May 28, 2026 21:15
@simbadzina simbadzina dismissed wmoustafa’s stale review May 28, 2026 21:16

Shifted to logging a warning now.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment on lines +105 to +120
private static boolean nestedRowTypesAlignedByOrder(RelDataType a, RelDataType b) {
if (a.isStruct() != b.isStruct()) {
return false;
}
if (a.isStruct()) {
return fieldNamesAlignedByOrder(a, b);
}
final RelDataType ac = a.getComponentType();
final RelDataType bc = b.getComponentType();
if ((ac == null) != (bc == null)) {
return false;
}
if (ac != null) {
return nestedRowTypesAlignedByOrder(ac, bc);
}
return true;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants