-
Notifications
You must be signed in to change notification settings - Fork 175
Make ExtendedGcsFileSystem the default implementation #773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ankitaluthra1
merged 14 commits into
fsspec:main
from
ankitaluthra1:default-to-extended-gcsfs
Mar 13, 2026
+142
−15
Merged
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
cf68701
makes ExtendedGcsFileSystem as default
ankitaluthra1 258f227
fixes lint errors
ankitaluthra1 bb990b8
minor comment fix
ankitaluthra1 9c2a32b
adds more tests
ankitaluthra1 a2ceb4b
fixes coverage report
ankitaluthra1 cfb98f5
fixes documentation
ankitaluthra1 2dd9b2d
adds step to run Full Test Suite (with Extended tests (Zonal & HNS) E…
ankitaluthra1 be1f871
updates step names to run tests on ci
ankitaluthra1 b2c3e78
minor fix
ankitaluthra1 99b4ea2
optimise test runs on ci
ankitaluthra1 7b39ebc
minor name fix
ankitaluthra1 a9f8c29
updates hns_buckets.rst
ankitaluthra1 fd5fde5
updates index.rst with default implementation
ankitaluthra1 a99e655
updates hns.rst with rename benchmarks
ankitaluthra1 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,89 @@ | ||
| Hierarchical Namespace (HNS) | ||
| ============================================== | ||
|
|
||
| To train, checkpoint, and serve AI models at peak efficiency, Google Cloud Storage (GCS) offers **Hierarchical Namespace (HNS)**. | ||
|
|
||
| ``gcsfs`` provides full support for all data and metadata operations on HNS buckets. | ||
|
|
||
| What is a Hierarchical Namespace (HNS)? | ||
| --------------------------------------- | ||
|
|
||
| Historically, GCS buckets have utilized a **flat namespace**. In a flat | ||
| namespace, directories do not exist as distinct physical entities; they are | ||
| simulated by 0-byte objects ending in a slash (``/``) or by filtering object | ||
| prefixes during list operations. | ||
|
|
||
| A `Hierarchical Namespace (HNS) <https://cloud.google.com/storage/docs/hns-overview>`_ introduces true, logical directories as first-class resources to GCS. | ||
|
|
||
| Under the Hood: The ``ExtendedFileSystem`` | ||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| ``gcsfs`` utilizes the ``ExtendedFileSystem`` class under the hood (implemented in `gcsfs/extended_gcsfs.py <https://github.com/fsspec/gcsfs/blob/main/gcsfs/extended_gcsfs.py>`_). | ||
|
|
||
| Importantly, ``ExtendedFileSystem`` is designed to be fully backward-compatible. Before executing directory operations, it automatically identifies the underlying bucket type. If it detects a standard flat-namespace bucket, it routes the request back to standard object-level operations, ensuring your existing buckets continue to work without issue. | ||
|
|
||
| The fundamental architectural shift is that ``ExtendedFileSystem`` actively routes directory-level operations to the **GCS Folders grpc API** instead of relying solely on the Objects API. | ||
|
|
||
| .. list-table:: **Operation Semantics: Flat Namespace vs. HNS** | ||
| :widths: 15 40 45 | ||
| :header-rows: 1 | ||
|
|
||
| * - Operation | ||
| - Flat Namespace (Standard ``gcsfs``) | ||
| - HNS Namespace (``ExtendedFileSystem``) | ||
| * - **``mkdir``** | ||
| - Only used for creating buckets, since GCS Flat namespace doesn't have real directories. | ||
| - Calls the native GCS Folders API, creating physical GCS Folder resource instead of simulating with 0 byte object or object prefix. | ||
| * - **``rmdir``** | ||
| - Primarily used to delete buckets, as directories do not exist as distinct physical entities. | ||
| - Used to delete empty folders natively via the GCS Folders API, in addition to deleting buckets. | ||
| * - **``rm``** | ||
| - Paginates through and individually issues delete requests for every object matching the prefix. | ||
| - Deletes the folder resource and its contents via different delete requests corresponding to folder or file. | ||
| * - **``rename`` / ``mv``** | ||
| - Issues a ``Copy`` request for each object under the prefix, followed by ``Delete``. Non-atomic, ``O(N)``. | ||
| - Triggers a single native metadata-only rename on the folder. **Atomic** and more performant, ``O(1)``, helpful in Checkpointing. | ||
| * - **``info``** | ||
| - Infers directory existence by checking for child objects, returning mocked 0-byte metadata. | ||
| - Uses ``get_folder_metadata`` to explicitly query the Folders API, returning accurate metadata (creation time, resource IDs). | ||
|
|
||
| Important Differences to Keep in Mind | ||
|
ankitaluthra1 marked this conversation as resolved.
|
||
| ------------------------------------- | ||
|
|
||
| While ``gcsfs`` aims to abstract the differences via the ``fsspec`` API, you should be aware of standard HNS limitations imposed by the Google Cloud Storage API: | ||
|
|
||
| 1. **Implicit directories:** In standard GCS, you can create an object ``a/b/c.txt`` without the directories ``a/`` or ``a/b/`` physically existing. In HNS, the parent folder resources must exist (or be created) before the object can be written. ``gcsfs`` handles parent folder creation natively under the hood. | ||
| 2. **``mkdir`` behavior:** Previously, in a flat namespace, calling ``mkdir`` on a path could only ensure the underlying bucket exists. With HNS enabled, calling ``mkdir`` will create an actual folder resource in GCS. Furthermore, if you want to create nested folders (eg: bucket/a/b/c/d) pass ``create_parents=True``, it will physically create all intermediate folder resources along the specified path. | ||
| 3. **No mixing or toggling:** You cannot toggle HNS on an existing flat-namespace bucket. You must create a new HNS bucket and migrate your data. | ||
| 4. **Object naming:** Object names in HNS cannot end with a slash (``/``) unless without the creation of physical folder resources. | ||
| 5. **Rename Operation Benchmarks** | ||
|
|
||
| The following benchmarks show the time taken (in seconds) to rename a directory containing a large number of files (spread across 256 folders and 8 levels) in a standard Regional bucket versus an HNS bucket (can be replicated using `gcsfs/tests/perf/microbenchmarks/rename`): | ||
|
|
||
| .. list-table:: | ||
| :header-rows: 1 | ||
|
|
||
| * - File Count | ||
| - Standard Regional (seconds) | ||
| - HNS (seconds) | ||
| * - 65K Files | ||
| - 75.69 | ||
| - 15.4 | ||
| * - 100K Files | ||
| - 170.6 | ||
| - 23.2 | ||
|
|
||
| For more details on managing these buckets, refer to the official documentation for `Hierarchical Namespace <https://cloud.google.com/storage/docs/hns-overview>`_. | ||
|
|
||
| Disabling HNS Support | ||
| ------------------------------ | ||
|
|
||
| You can disable these features by explicitly setting an environment variable of the same name. | ||
|
|
||
| **Code Example** | ||
|
|
||
| .. code-block:: bash | ||
|
|
||
| export GCSFS_EXPERIMENTAL_ZB_HNS_SUPPORT=false | ||
|
|
||
| **Note:** *The choice of which filesystem class to use is made at import time based on the GCSFS_EXPERIMENTAL_ZB_HNS_SUPPORT environment variable, and cannot be controlled via constructor arguments passed to GCSFileSystem (but you can still import each class explicitly, if needed).* | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.