diff --git a/docs/xet/chunking.md b/docs/xet/chunking.md index 1d501e044f..16db8fcc11 100644 --- a/docs/xet/chunking.md +++ b/docs/xet/chunking.md @@ -48,6 +48,24 @@ When a boundary found or taken: At end-of-file, if `start_offset < len(data)`, emit the final chunk `[start_offset, len(data))`. +### Decision Flowchart + +```mermaid +flowchart TD + A["Read next byte b"] --> B["h = (h << 1) + TABLE[b]"] + B --> C["size = offset - start + 1"] + C --> D{"size < MIN_CHUNK_SIZE\n(8 KiB)?"} + D -->|Yes| A + D -->|No| E{"size >= MAX_CHUNK_SIZE\n(128 KiB)?"} + E -->|Yes| G["Emit chunk, reset h = 0"] + E -->|No| F{"(h & MASK) == 0?"} + F -->|Yes| G + F -->|No| A + G --> H{"End of file?"} + H -->|No| A + H -->|Yes| I["Emit final chunk if data remains"] +``` + ### Pseudocode ```text diff --git a/docs/xet/deduplication.md b/docs/xet/deduplication.md index 1f00cefa10..4c625bdb6f 100644 --- a/docs/xet/deduplication.md +++ b/docs/xet/deduplication.md @@ -56,10 +56,10 @@ When a file is processed for upload, it undergoes the following steps: ```mermaid graph TD - A[File Input] --> B[Content-Defined Chunking] - B --> C[Hash Computation] - C --> D[Chunk Creation] - D --> E[Deduplication Query] + A["File Input"] --> B["Content-Defined Chunking"] + B --> C["Hash Computation"] + C --> D["Chunk Creation"] + D --> E["Deduplication Query"] ``` 1. **Chunking**: Content-defined chunking using GearHash algorithm creates variable-sized chunks of file data diff --git a/docs/xet/file-id.md b/docs/xet/file-id.md index 9e4bae8afd..d6ab8bd70b 100644 --- a/docs/xet/file-id.md +++ b/docs/xet/file-id.md @@ -31,3 +31,14 @@ This is the string representation of the hash and can be used directly in the fi > [!NOTE] > The resolve URL will return a 302 redirect http status code, following the redirect will download the content via the old LFS compatible route rather than through the Xet protocol. In order to use the Xet protocol you MUST NOT follow this redirect. + +```mermaid +sequenceDiagram + autonumber + actor C as Client + participant Hub as Hugging Face Hub + C->>Hub: GET /namespace/repo/resolve/branch/filepath
Authorization: Bearer + Hub-->>C: 302 Redirect + X-Xet-Hash header + Note over C: Extract X-Xet-Hash value = Xet File ID
Do NOT follow the 302 redirect + C->>C: Use File ID with CAS Reconstruction API +``` diff --git a/docs/xet/hashing.md b/docs/xet/hashing.md index 6b2c3ab7c9..db157ba5c5 100644 --- a/docs/xet/hashing.md +++ b/docs/xet/hashing.md @@ -9,6 +9,19 @@ The Xet protocol utilizes a few different hashing types. All hashes referenced are 32 bytes (256 bits) long. +```mermaid +flowchart LR + subgraph Input + CD["Chunk Data"] + CH["Chunk Hashes"] + end + CD -->|"blake3(data, DATA_KEY)"| ChunkHash["Chunk Hash"] + ChunkHash --> CH + CH -->|"Merkle Tree\n+ INTERNAL_NODE_KEY"| XorbHash["Xorb Hash"] + CH -->|"Merkle Tree\n+ INTERNAL_NODE_KEY\nthen blake3(root, zeros)"| FileHash["File Hash"] + CH -->|"blake3(concat hashes,\nVERIFICATION_KEY)"| VerifHash["Term Verification Hash"] +``` + ## Chunk Hashes After cutting a chunk of data, the chunk hash is computed via a blake3 keyed hash with the following key (DATA_KEY): diff --git a/docs/xet/index.md b/docs/xet/index.md index 18faa58822..532852eb62 100644 --- a/docs/xet/index.md +++ b/docs/xet/index.md @@ -19,6 +19,43 @@ Implementors can create their own clients, SDKs, and tools that speak the Xet pr ## Overall Xet Architecture +```mermaid +block + columns 3 + File["πŸ“„ File"] + space + space + CDC["Chunking (CDC)"] + space + space + block:chunks + columns 5 + C0["Chunk 0"] C1["Chunk 1"] C2["Chunk 2"] C3["..."] C4["Chunk N"] + end + space + space + space + block:xorbs + columns 2 + X0["Xorb A\n(chunks 0–1023)"] + X1["Xorb B\n(chunks 1024–N)"] + end + space + Shard["Shard\n(file reconstructions\n+ xorb metadata)"] + space + space + space + CAS["CAS Server\n(Content Addressable Storage)"] + space + space + File --> CDC + CDC --> chunks + chunks --> xorbs + xorbs --> Shard + xorbs --> CAS + Shard --> CAS +``` + - [Content-Defined Chunking](./chunking): Gearhash-based CDC with parameters, boundary rules, and performance optimizations. - [Hashing Methods](./hashing): Descriptions and definitions of the different hashing functions used for chunks, xorbs and term verification entries. - [File Reconstruction](./file-reconstruction): Defining "term"-based representation of files using xorb hash + chunk ranges. diff --git a/docs/xet/shard.md b/docs/xet/shard.md index ab62b370e1..31d3a87e9d 100644 --- a/docs/xet/shard.md +++ b/docs/xet/shard.md @@ -116,12 +116,14 @@ struct MDBShardFileHeader { **Memory Layout**: -```txt -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ tag (32 bytes) β”‚ version β”‚ footer_sz β”‚ -β”‚ Magic Number Identifier β”‚ (8 bytes) β”‚ (8 bytes) β”‚ -β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ -0 32 40 48 +```mermaid +--- +title: "MDBShardFileHeader (48 bytes)" +--- +packet + 0-31: "tag (32 bytes) β€” Magic Number Identifier" + 32-39: "version (u64)" + 40-47: "footer_size (u64)" ``` **Deserialization steps**: @@ -220,12 +222,15 @@ Given the `file_data_sequence_header.file_flags & MASK` (bitwise AND) operations **Memory Layout**: -```txt -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ file_hash (32 bytes) β”‚file_flagsβ”‚num_entriesβ”‚ _unused β”‚ -β”‚ File Hash Value β”‚(4 bytes) β”‚(4 bytes) β”‚ (8 bytes) β”‚ -β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ -0 32 36 40 48 +```mermaid +--- +title: "FileDataSequenceHeader (48 bytes)" +--- +packet + 0-31: "file_hash (32 bytes)" + 32-35: "file_flags (u32)" + 36-39: "num_entries (u32)" + 40-47: "_unused (8 bytes)" ``` ### FileDataSequenceEntry @@ -247,13 +252,16 @@ struct FileDataSequenceEntry { **Memory Layout**: -```txt -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ cas_hash (32 bytes) β”‚cas_flagsβ”‚unpacked β”‚chunk_idxβ”‚chunk_idxβ”‚ -β”‚ CAS Block Hash β”‚(4 bytes)β”‚seg_bytesβ”‚start β”‚end β”‚ -β”‚ β”‚ β”‚(4 bytes)β”‚(4 bytes)β”‚(4 bytes)β”‚ -β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ -0 32 36 40 44 48 +```mermaid +--- +title: "FileDataSequenceEntry (48 bytes)" +--- +packet + 0-31: "cas_hash (32 bytes) β€” Xorb Hash" + 32-35: "cas_flags (u32)" + 36-39: "unpacked_segment_bytes (u32)" + 40-43: "chunk_index_start (u32)" + 44-47: "chunk_index_end (u32)" ``` ### FileVerificationEntry (OPTIONAL) @@ -271,12 +279,13 @@ struct FileVerificationEntry { **Memory Layout**: -```txt -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ range_hash (32 bytes) β”‚ _unused (16 bytes) β”‚ -β”‚ Verification Hash β”‚ Reserved Space β”‚ -β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ -0 32 48 +```mermaid +--- +title: "FileVerificationEntry (48 bytes)" +--- +packet + 0-31: "range_hash (32 bytes) β€” Verification Hash" + 32-47: "_unused (16 bytes)" ``` When a shard has verification entries, all file info sections MUST have verification entries. @@ -302,12 +311,13 @@ struct FileMetadataExt { **Memory Layout**: -```txt -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ sha256 (32 bytes) β”‚ _unused (16 bytes) β”‚ -β”‚ SHA256 Hash β”‚ Reserved Space β”‚ -β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ -0 32 48 +```mermaid +--- +title: "FileMetadataExt (48 bytes)" +--- +packet + 0-31: "sha256 (32 bytes) β€” SHA256 Hash" + 32-47: "_unused (16 bytes)" ``` ### File Info Bookend @@ -381,13 +391,16 @@ struct CASChunkSequenceHeader { **Memory Layout**: -```txt -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ cas_hash (32 bytes) β”‚cas_flagsβ”‚num_ β”‚num_bytesβ”‚num_bytesβ”‚ -β”‚ CAS Block Hash β”‚(4 bytes)β”‚entries β”‚in_cas β”‚on_disk β”‚ -β”‚ β”‚ β”‚(4 bytes)β”‚(4 bytes)β”‚(4 bytes)β”‚ -β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ -0 32 36 40 44 48 +```mermaid +--- +title: "CASChunkSequenceHeader (48 bytes)" +--- +packet + 0-31: "cas_hash (32 bytes) β€” Xorb Hash" + 32-35: "cas_flags (u32)" + 36-39: "num_entries (u32)" + 40-43: "num_bytes_in_cas (u32)" + 44-47: "num_bytes_on_disk (u32)" ``` ### CASChunkSequenceEntry @@ -406,15 +419,15 @@ struct CASChunkSequenceEntry { **Memory Layout**: -```txt -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ chunk_hash (32 bytes) β”‚chunk_ β”‚unpacked β”‚ _unused β”‚ -β”‚ Chunk Hash β”‚byte_ β”‚segment_ β”‚ (8 bytes) β”‚ -β”‚ β”‚range_ β”‚bytes β”‚ β”‚ -β”‚ β”‚start β”‚(4 bytes)β”‚ β”‚ -β”‚ β”‚(4 bytes)β”‚ β”‚ β”‚ -β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ -0 32 36 40 48 +```mermaid +--- +title: "CASChunkSequenceEntry (48 bytes)" +--- +packet + 0-31: "chunk_hash (32 bytes)" + 32-35: "chunk_byte_range_start (u32)" + 36-39: "unpacked_segment_bytes (u32)" + 40-47: "_unused (8 bytes)" ``` ### CAS Info Bookend @@ -451,23 +464,20 @@ struct MDBShardFileFooter { **Memory Layout**: -> [!NOTE] -> Fields are not exactly to scale - -```txt -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ version β”‚file_infoβ”‚cas_info β”‚ _buffer (reserved) β”‚ chunk_hash_hmac_key β”‚ -β”‚(8 bytes)β”‚offset β”‚offset β”‚ (48 bytes) β”‚ (32 bytes) β”‚ -β”‚ β”‚(8 bytes)β”‚(8 bytes)β”‚ β”‚ β”‚ -β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ -0 8 16 24 72 104 - -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚creation β”‚shard_ β”‚ _buffer (reserved) β”‚footer_ β”‚ -β”‚timestampβ”‚key_expiryβ”‚ (72 bytes) β”‚offset β”‚ -β”‚(8 bytes)β”‚ (8 bytes)β”‚ β”‚(8 bytes)β”‚ -β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ -104 112 120 192 200 +```mermaid +--- +title: "MDBShardFileFooter (200 bytes)" +--- +packet + 0-7: "version (u64)" + 8-15: "file_info_offset (u64)" + 16-23: "cas_info_offset (u64)" + 24-71: "_buffer (48 bytes reserved)" + 72-103: "chunk_hash_hmac_key (32 bytes)" + 104-111: "shard_creation_timestamp (u64)" + 112-119: "shard_key_expiry (u64)" + 120-191: "_buffer2 (72 bytes reserved)" + 192-199: "footer_offset (u64)" ``` **Deserialization steps**: diff --git a/docs/xet/xorb.md b/docs/xet/xorb.md index e043b129ec..b1f399dcb5 100644 --- a/docs/xet/xorb.md +++ b/docs/xet/xorb.md @@ -58,13 +58,15 @@ the uncompressed size also being at a maximum of 128KiB. #### Chunk Header Layout -```txt -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ Version β”‚ Compressed Size β”‚ Compression β”‚ Uncompressed Size β”‚ -β”‚ 1 byte β”‚ 3 bytes β”‚ Type β”‚ 3 bytes β”‚ -β”‚ β”‚ (little-endian) β”‚ 1 byte β”‚ (little-endian) β”‚ -β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ -0 1 4 5 8 +```mermaid +--- +title: "Chunk Header (8 bytes)" +--- +packet + 0-7: "Version (1 byte)" + 8-31: "Compressed Size (3 bytes, LE)" + 32-39: "Compression Type (1 byte)" + 40-63: "Uncompressed Size (3 bytes, LE)" ``` ### Chunk Compression Schemes