Skip to content

HIVE-29241: Distinguish the default location of the database by catalog#6267

Open
zhangbutao wants to merge 3 commits into
apache:masterfrom
zhangbutao:HIVE-29241_LOCATION
Open

HIVE-29241: Distinguish the default location of the database by catalog#6267
zhangbutao wants to merge 3 commits into
apache:masterfrom
zhangbutao:HIVE-29241_LOCATION

Conversation

@zhangbutao

@zhangbutao zhangbutao commented Jan 14, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

The main objectives of this PR are:

  1. Allow location to be omitted when creating a catalog. At the same time, it is necessary to determine catalogs that do not require a location, such as JDBC catalogs.

  2. For newly created catalogs, the location of their databases will be determined by the two newly added parameters: metastore.warehouse.catalog.dir and metastore.warehouse.catalog.external.dir. This helps better ensure that the location of the default Hive catalog's databases and tables remains unaffected.

For example, if metastore.warehouse.catalog.dir is hdfs://ns1/testdir, then the location for a newly created catalog named testcat would be hdfs://ns1/testdir/testcat. Consequently, the default path for a database like testdb created under this catalog would be hdfs://ns1/testdir/testcat/testdb.

  1. Modify the semantics of catalog creation. The type parameter must be specified when creating a catalog to distinguish its type. Based on this type, it will be determined whether the catalog's databases and tables require a location, or whether the catalog type supports creating databases and tables.
    CREATE CATALOG test_cat COMMENT 'Hive test catalog' PROPERTIES('type'='hive');

BTW, While working on this PR, some tasks need to be carried forward. For these follow-up tasks, I have created sub-tickets. You can also search for the TODO catalog comments in the catalog for more details on these follow-up tasks.
HIVE-29564
HIVE-29562
HIVE-29561
HIVE-29278

Why are the changes needed?

The paths of databases and tables created under a non-default catalog may interfere with those under the default Hive catalog. We need to introduce parameters to distinguish the paths for databases and tables under non-default catalogs, ensuring that the paths of the default Hive catalog remain unaffected.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces catalog-aware warehouse paths for Hive metastore, allowing databases and tables under non-default catalogs to have distinct storage locations. The changes ensure that catalog locations are properly isolated while maintaining backward compatibility for the default Hive catalog.

Changes:

  • Added new configuration parameters (WAREHOUSE_CATALOG and WAREHOUSE_CATALOG_EXTERNAL) for catalog-specific warehouse directories
  • Modified catalog creation to require a 'type' parameter and made location optional with automatic defaults
  • Updated database path resolution throughout the codebase to use catalog-aware logic via new Warehouse methods that accept catalog names

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 21 comments.

Show a summary per file
File Description
MetastoreConf.java Added WAREHOUSE_CATALOG and WAREHOUSE_CATALOG_EXTERNAL configuration variables
HiveConf.java Added corresponding Hive configuration variables for catalog warehouses
Warehouse.java Enhanced path resolution methods to accept catalog names; added deprecated markers for old methods
CatalogUtil.java New utility class for catalog type validation (NATIVE, ICEBERG)
CreateCatalogAnalyzer.java Updated to validate catalog type property and make location optional
CreateCatalogOperation.java Modified to generate default location from WAREHOUSE_CATALOG config
CreateDatabaseAnalyzer.java Added validation to check if catalog supports database creation
CreateDatabaseOperation.java Updated path building logic to include catalog names for non-default catalogs
DDLUtils.java Added helper method to check catalog support for database creation
HiveParser.g Modified grammar to make catalog location optional and properties required
EximUtil.java Updated to use catalog-aware warehouse paths for replication
Multiple test files Updated test queries to include required 'type' property in CREATE CATALOG statements
HMSHandler.java Updated default catalog and database creation to use catalog-aware paths
MetastoreDefaultTransformer.java Updated to pass Database object instead of just name
Migration/Authorization files Updated to use new Database-based path methods

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ql/src/java/org/apache/hadoop/hive/ql/ddl/DDLUtils.java Outdated
Comment thread common/src/java/org/apache/hadoop/hive/conf/HiveConf.java Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ql/src/java/org/apache/hadoop/hive/ql/parse/EximUtil.java Outdated
Comment thread parser/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g Outdated
Comment thread common/src/java/org/apache/hadoop/hive/conf/HiveConf.java Outdated
Comment thread common/src/java/org/apache/hadoop/hive/conf/HiveConf.java Outdated
Comment thread ql/src/java/org/apache/hadoop/hive/ql/parse/EximUtil.java Outdated
Comment thread ql/src/java/org/apache/hadoop/hive/ql/parse/EximUtil.java
Comment thread ql/src/test/queries/clientnegative/lockneg_try_lock_cat_db_in_use.q Outdated
Comment thread ql/src/test/queries/clientpositive/catalog.q Outdated
Comment thread ql/src/test/queries/clientpositive/catalog.q Outdated
}

whRootString = getRequiredVar(conf, ConfVars.WAREHOUSE);
whCatRootString = getRequiredVar(conf, ConfVars.WAREHOUSE_CATALOG);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please see the comment on default FS structure

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already repied above.

return getWhRoot();
}
return new Path(getWhRoot(), dbName.toLowerCase() + DATABASE_WAREHOUSE_SUFFIX);
Database db = new Database();

@deniskuzZ deniskuzZ Apr 11, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you using Database only as a DTO object? instead you could use a record CatalogDb as mentioned earlier

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should work well. Regarding the changes made here, could you help me point out the more prominent advantages of using record CatalogDb?
Thank you.

@deniskuzZ deniskuzZ Apr 24, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CatalogDb is a self-describing, single-purpose DTO. It clearly communicates intent, keeps the data model focused, and avoids overloading more generic structures.
Note that was only a suggestion, and if you don’t like it, that’s totally fine.


// Supported Catalog types
public enum CatalogType {
HIVE,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this CatalogType for what not where? if it's for what, maybe named as CatalogConnector or something else

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand what you mean. Could you explain it in more detail? Thank you.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the purpose of CatalogType? if it's used to describe the source, CatalogConnector or CatalogScope would be better?

public void createCatalog(Catalog cat) throws MetaException {
LOG.debug("Creating catalog {}", cat);

ensureCatalogType(cat);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we uplift the check to HMSHandler#create_catalog?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HMSHandler#create_catalog may not be the only entry point for creating a catalog. So I think it would be more appropriate to place it in ObjectStore.

What negative impacts do you think putting it here might have?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For user input check, it's better to perform at the hander, ObjectStore is where we manage the metadata.
For negative impacts, we need to prepare an ObjectStore though it's invalid and might take a connection from the connection pool.

@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants