Skip to content

Hive 29574 merge join poc#6456

Open
illiabarbashov-sketch wants to merge 1 commit into
apache:masterfrom
illiabarbashov-sketch:HIVE-29574_merge_join_poc
Open

Hive 29574 merge join poc#6456
illiabarbashov-sketch wants to merge 1 commit into
apache:masterfrom
illiabarbashov-sketch:HIVE-29574_merge_join_poc

Conversation

@illiabarbashov-sketch

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

  1. New Hive configurations: hive.merge.join.skew.threshold and hive.merge.join.skew.abort
  2. A new SkewedMergeJoinMonitor class to manage those configurations and log skew join event or abort it.
  3. Unit tests to cover positive and negative cases
  4. Query tests to cover clientpositive and clientnegative test cases

Why are the changes needed?

  1. This feature adds observability to merge join operator and flags the skewed keys
  2. Clients have problem with Skewed Merge Join when the job is stuck and there are no progress and no information about the reasons behind this issue. This feature adds a configuration to abort the stuck job if the threshold is hit.

Does this PR introduce any user-facing change?

No

How was this patch tested?

  • Unit tests
  • Query tests with clientpositive and clientnegative cases

@sonarqubecloud

Copy link
Copy Markdown

@abstractdog abstractdog left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice patch @illiabarbashov-sketch so far, let some comments

Comment thread ql/src/test/queries/clientnegative/mergejoin_skew_abort.q Outdated
Comment thread ql/src/test/queries/clientpositive/mergejoin_skew_warn.q Outdated
Comment thread ql/src/test/queries/clientpositive/mergejoin_skew_warn.q Outdated
Comment thread ql/src/test/queries/clientnegative/mergejoin_skew_abort.q Outdated
Comment thread ql/src/test/queries/clientpositive/mergejoin_skew_warn.q Outdated
Comment thread ql/src/test/queries/clientpositive/mergejoin_skew_warn.q Outdated
Comment thread ql/src/test/queries/clientnegative/mergejoin_skew_abort.q Outdated
Comment thread common/src/java/org/apache/hadoop/hive/conf/HiveConf.java Outdated
Comment thread common/src/java/org/apache/hadoop/hive/conf/HiveConf.java Outdated

@abstractdog abstractdog left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left minor comments, also the last run has a related failure:
https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-6456/12/tests/

other than that, this is quite close! 🤞

Comment on lines +21 to +22
import org.apache.hadoop.hive.ql.exec.*;
import org.apache.hadoop.hive.ql.plan.*;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we tend to extract wildcard imports to one-by-one

ReduceSinkDesc rsConf = mock(ReduceSinkDesc.class);
when(rsConf.getKeyCols()).thenReturn(Collections.singletonList(keyExpr));

@SuppressWarnings("unchecked") ReduceSinkOperator rs = mock(ReduceSinkOperator.class);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most style guides and formatters (including Google Java Format) will place the annotation on its own line for readability, I recommend doing the same here

String[] descTableAliases = conf.getSkewJoinTableAliases();

for (int pos = 0; pos < maxAlias; pos++) {
joinSkewKeyColumns[pos] = (descKeyNames != null && pos < descKeyNames.length

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can still see the same pattern here, can you double-check, please?

      joinSkewKeyColumns[pos]  = (descKeyNames    != null && pos < descKeyNames.length
          && descKeyNames[pos] != null)    ? descKeyNames[pos]    : "unknown";

@abstractdog

abstractdog commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

there are some warnings in sonarqube report:
https://sonarcloud.io/project/issues?sinceLeakPeriod=true&issueStatuses=OPEN%2CCONFIRMED&pullRequest=6456&id=apache_hive&open=AZ6On0YSJoVnC2vLVdC3

some of them can be fixed easily, like Line is longer than 120 characters (found 152). , '{' is followed by whitespace., or Using the '.*' form of import should be avoided, try to fix them where it makes sense to you, sorry that I haven't caught them earlier

@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants