Skip to content

HIVE-29281: Make proactive cache eviction work with catalog#6379

Open
Neer393 wants to merge 1 commit into
apache:masterfrom
Neer393:HIVE-29281
Open

HIVE-29281: Make proactive cache eviction work with catalog#6379
Neer393 wants to merge 1 commit into
apache:masterfrom
Neer393:HIVE-29281

Conversation

@Neer393

@Neer393 Neer393 commented Mar 19, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Made the proactive cache eviction catalog aware by making changes in ProactiveCacheEviction file and the CacheTag file.

Why are the changes needed?

The proactive cache eviction should be catalog aware otherwise same name tables under different catalogs may cause false cache hits/miss. To avoid this, the cache eviction should be aware of the catalog.

Does this PR introduce any user-facing change?

No user facing changes as user does not know about the proactive cache eviction.

How was this patch tested?

Added unit tests for with and without catalog and all of them passed. Not sure how to manually test proactive cache eviction so verified only via unit tests

@Neer393

Neer393 commented Mar 20, 2026

Copy link
Copy Markdown
Contributor Author

@zhangbutao I need a review here. I looked at all the merged PRs under HIVE-22820 and have made changes accordingly for making it catalog aware. Please help me here if I missed anything. Thanks

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes LLAP proactive cache eviction catalog-aware by propagating catalog names through cache tags and eviction requests, preventing collisions when identical db/table names exist across catalogs.

Changes:

  • Extend LLAP proactive eviction request structure to include catalog scoping (catalog → db → table → partitions).
  • Introduce catalog tracking on TableDesc/PartitionDesc and update cache-tag generation to include catalog-qualified names.
  • Update LLAP cache metadata serialization and unit tests to reflect catalog-qualified cache tags.

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
storage-api/src/java/org/apache/hadoop/hive/common/io/CacheTag.java Updates cache tag semantics/docs and parent-tag derivation to preserve catalog prefix.
ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java Adds catalogName field and updates constructors/clone to carry catalog without polluting EXPLAIN.
ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java Exposes catalog name via PartitionDesc based on TableDesc, with default fallback.
ql/src/java/org/apache/hadoop/hive/llap/ProactiveEviction.java Makes eviction requests catalog-scoped and includes catalog in proto requests and tag matching.
llap-common/src/protobuf/LlapDaemonProtocol.proto Adds catalog name field to EvictEntityRequestProto.
ql/src/java/org/apache/hadoop/hive/llap/LlapHiveUtils.java Prefixes cache tags with catalog when deriving metrics tags.
llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java Adjusts eviction debug logging for catalog+db structure.
llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapCacheMetadataSerializer.java Adds backward-ish handling for cache tags missing catalog during decode.
llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcFileEstimateErrors.java Updates synthetic tag to include default catalog prefix.
ql/src/java/org/apache/hadoop/hive/ql/ddl/** Ensures eviction builders are invoked with catalog where available.
ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java Ensures TableDesc created from Table carries catalog name.
ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java Updates TableDesc construction call sites for new signature.
Various test files Update existing tests and add new coverage for catalog-aware eviction and proto round-trips.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread llap-common/src/protobuf/LlapDaemonProtocol.proto Outdated
Comment on lines +337 to 344
/**
* Add a partition of a table scoped to the given catalog.
*/
public Builder addPartitionOfATable(String catalog, String db, String tableName,
LinkedHashMap<String, String> partSpec) {
ensureTable(catalog, db, tableName);
entities.get(catalog).get(db).get(tableName).add(partSpec);
return this;

Copilot AI Mar 23, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request.Builder claims the catalog key defaults to Warehouse.DEFAULT_CATALOG_NAME, but the builder currently stores the catalog parameter as-is. If a caller passes null (or an empty string), this will create a null key and later NPE in toProtoRequests() when calling toLowerCase(). Normalize catalog (and arguably db/table) at the builder boundary, e.g. default null/blank catalog to the default catalog and enforce non-null keys.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All calls made to this will never be null.

Comment thread ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java
Comment thread ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java Outdated
Comment thread ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java Outdated
@Neer393

Neer393 commented Mar 25, 2026

Copy link
Copy Markdown
Contributor Author

@zhangbutao resolved the copilot's reviews

@zhangbutao

Copy link
Copy Markdown
Contributor

@zhangbutao resolved the copilot's reviews

Thanks for pinging me. I will do the code review later.

Comment thread itests/hive-jmh/src/main/java/org/apache/hive/benchmark/ql/exec/KryoBench.java Outdated
Comment thread llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +222 to +228
public String getCatalogName() {
return catalogName;
}

public void setCatalogName(String catalogName) {
this.catalogName = catalogName == null ? Warehouse.DEFAULT_CATALOG_NAME : catalogName;
}

Copilot AI Apr 7, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TableDesc.catalogName can remain null for instances built via the no-arg constructor + setters, even though the new constructor/setter normalize null to Warehouse.DEFAULT_CATALOG_NAME. Consider initializing catalogName eagerly (field initializer or in TableDesc()) so getCatalogName() never returns null. Also, equals()/hashCode() currently ignore catalogName, which can cause different-catalog descriptors to compare equal and collide in hash-based collections; include catalogName in both (or document why it must be excluded).

Copilot uses AI. Check for mistakes.
Comment on lines +169 to +170
* the same DB name, and that getSingleCatalogName/getSingleDbName return null when multiple
* catalog-DB pairs are present.

Copilot AI Apr 7, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test’s Javadoc mentions getSingleCatalogName/getSingleDbName, but those methods no longer exist (they were replaced by hasCatalogName/hasDatabaseName). Update the comment to reflect the current API/behavior to avoid confusion for future maintainers.

Suggested change
* the same DB name, and that getSingleCatalogName/getSingleDbName return null when multiple
* catalog-DB pairs are present.
* the same DB name, and that requests spanning multiple catalog-DB pairs are not treated as
* having a single catalog or database; callers should use hasCatalogName/hasDatabaseName with
* explicit values instead.

Copilot uses AI. Check for mistakes.
Comment thread ql/src/java/org/apache/hadoop/hive/llap/LlapHiveUtils.java
Comment thread ql/src/java/org/apache/hadoop/hive/llap/LlapHiveUtils.java Outdated
Comment thread ql/src/java/org/apache/hadoop/hive/llap/ProactiveEviction.java Outdated
Comment thread ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java Outdated
@sonarqubecloud

Copy link
Copy Markdown

Comment thread ql/src/java/org/apache/hadoop/hive/llap/ProactiveEviction.java
@Neer393

Neer393 commented May 8, 2026

Copy link
Copy Markdown
Contributor Author

@zhangbutao @deniskuzZ any help on this ?
How do we proceed ?

Comment thread ql/src/java/org/apache/hadoop/hive/llap/ProactiveEviction.java Outdated
Comment thread ql/src/java/org/apache/hadoop/hive/llap/ProactiveEviction.java
@deniskuzZ

deniskuzZ commented Jun 5, 2026

Copy link
Copy Markdown
Member

LGTM, @Neer393 please check last 2 comments if those make sense

@Neer393

Neer393 commented Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

Hi @deniskuzZ
As per your comments, changes have been made. These changes are also handling proactive eviction for 4-part metatables in iceberg.
Once CI succeeds, we are good to merge 👍

@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants