-
Notifications
You must be signed in to change notification settings - Fork 10
feat: Add suffix chunk key encoding
#28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mkitti
wants to merge
7
commits into
zarr-developers:main
Choose a base branch
from
mkitti:mkitti-chunk-key-encoding-suffix
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 3 commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
f4e22ea
Add README for `suffix` chunk key encoding
mkitti 254aae2
snake_case and clarify default
mkitti 08d9470
Make `base-encoding` a mandatory configuration parameter
mkitti 1a8c680
Apply suggestion from @normanrz
mkitti 03d0f39
Apply suggestion from @normanrz
mkitti cf0d841
Apply suggestion from @normanrz
mkitti 5558ff6
Apply suggestion from @normanrz
mkitti File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,87 @@ | ||
| # ZEP: `suffix` Chunk Key Encoding | ||
|
|
||
| ## Summary | ||
|
|
||
| This document proposes a new Zarr v3 chunk-key-encoding extension named `suffix`. This encoding appends a user-defined string (the "suffix") to the key generated by a base chunk key encoding. The primary motivation is to allow chunk keys to have file extensions (e.g., `.tiff`, `.zip`), making them directly usable by operating systems and other software that identify file types by their extension. | ||
|
|
||
| --- | ||
|
|
||
| ## Motivation | ||
|
|
||
| Modern scientific workflows often involve a variety of tools. While Zarr provides excellent chunked, N-dimensional data access, individual chunks can sometimes be valid, standalone files in other formats. A prime example is a Zarr array sharded into TIFF files. Each shard is both a chunk in the Zarr hierarchy and a complete TIFF file. | ||
|
|
||
| Currently, Zarr chunk keys (like `c/0/0`) lack file extensions. This prevents a user or application from easily identifying and opening these chunks with standard tools (e.g., an image viewer). To work around this, data must be duplicated or accessed exclusively through a Zarr library. | ||
|
|
||
| The `suffix` encoding solves this problem by adding a file extension to the chunk key. This creates a dual-access system: | ||
| 1. **Zarr Access**: The data remains a fully compliant Zarr array, accessible via the Zarr protocol. | ||
| 2. **Direct File Access**: The individual chunk files can be directly opened, viewed, or processed by any tool that recognizes their file extension. | ||
|
|
||
| This enhances interoperability and simplifies workflows that bridge Zarr and traditional file-based tools without requiring data duplication. | ||
|
|
||
| --- | ||
|
|
||
| ## Specification | ||
|
|
||
| * **Name**: `suffix` | ||
| * **Version**: `0.1` | ||
| * **Identifier**: (A unique URI to be assigned upon formal adoption) | ||
|
|
||
| ### Configuration | ||
|
|
||
| The configuration for this encoding is a JSON object with two required members. | ||
|
|
||
| * `"suffix"`: **(Required)** A string that will be appended to the encoded chunk key. | ||
| * `"base_encoding"`: **(Required)** A chunk key encoding configuration object. This specifies the "base" encoding to be used *before* the suffix is appended. | ||
|
|
||
| #### Example 1: Simple Suffix | ||
|
|
||
| This configuration appends `.tiff` to the key generated by the `default` chunk key encoding. | ||
|
|
||
| ```json | ||
| { | ||
| "name": "suffix", | ||
| "configuration": { | ||
| "suffix": ".tiff", | ||
| "base-encoding": { | ||
| "name": "default" | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| #### Example 2: Suffix with a Custom Base Encoding | ||
|
|
||
| This configuration first encodes the chunk key using the `v2` naming scheme and then appends `.shard.zip`. | ||
|
|
||
| ```json | ||
| { | ||
| "name": "suffix", | ||
| "configuration": { | ||
| "suffix": ".shard.zip", | ||
| "base-encoding": { | ||
mkitti marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| "name": "v2" | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Encoding and Decoding Logic | ||
|
|
||
| The implementation logic is a simple wrapper around an existing chunk key encoding. | ||
|
|
||
| ### Encoding | ||
|
|
||
| 1. Take the chunk coordinate tuple as input (e.g., `(1, 2)`). | ||
| 2. Encode the coordinates using the specified **`base-encoding`**. This might transform `(1, 2)` into `"c/1/2"` if the `base-encoding` is set to `default` | ||
mkitti marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| 3. Append the `suffix` from the configuration to the result of the base encoding. | ||
|
|
||
| The final key is `base_encoded_key + suffix` (e.g., `"c/1/2.tiff"`). | ||
|
|
||
| ### Decoding | ||
|
|
||
| 1. Take the full chunk key string as input (e.g., `"c/1/2.tiff"`). | ||
| 2. Verify that the key ends with the configured `suffix`. If not, it is an invalid key for this encoding. | ||
| 3. Remove the `suffix` from the end of the key string to get the base key (e.g., `"c/1/2"`). | ||
| 4. Decode the remaining base key using the specified **`base-encoding`** to retrieve the original chunk coordinate tuple `(1, 2)`. | ||
mkitti marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.