Skip to content

Subarray dtypes get lost on serialization / casted to void type #3582

@sehoffmann

Description

@sehoffmann

Zarr version

v3.1.3

Numcodecs version

v0.15.1

Python Version

3.12.10

Operating System

Linux

Installation

uv / pip

Description

Subarray dtypes are not properly serialized but are cast to raw bytes / void dtype upon serialization. Subsequent access, hence, does not yield arrays with the proper shapes.

Output:

zarr/core/dtype/npy/structured.py:318: UnstableSpecificationWarning: The data type (Structured(fields=(('a', Int32(endianness='little')), ('b', RawBytes(length=100))))) does not have a Zarr V3 specification. That means that the representation of arrays saved with this data type may change without warning in a future version of Zarr Python. Arrays stored with this data type may be unreadable by other Zarr libraries. Use this data type at your own risk! Check https://github.com/zarr-developers/zarr-extensions/tree/main/data-types for the status of data type specifications for Zarr V3.
  v3_unstable_dtype_warning(self)
zarr/core/dtype/npy/bytes.py:785: UnstableSpecificationWarning: The data type (RawBytes(length=100)) does not have a Zarr V3 specification. That means that the representation of arrays saved with this data type may change without warning in a future version of Zarr Python. Arrays stored with this data type may be unreadable by other Zarr libraries. Use this data type at your own risk! Check https://github.com/zarr-developers/zarr-extensions/tree/main/data-types for the status of data type specifications for Zarr V3.
  v3_unstable_dtype_warning(self)
Original dtype: [('a', '<i4'), ('b', '<f4', (5, 5))]
Array created with dtype: [('a', '<i4'), ('b', 'V100')]
Accessed item dtype:  [('a', '<i4'), ('b', 'V100')]

zarr.json:

{
  "shape": [
    10
  ],
  "data_type": {
    "name": "structured",
    "configuration": {
      "fields": [
        [
          "a",
          "int32"
        ],
        [
          "b",
          {
            "name": "raw_bytes",
            "configuration": {
              "length_bytes": 100
            }
          }
        ]
      ]
    }
  },
  "chunk_grid": {
    "name": "regular",
    "configuration": {
      "chunk_shape": [
        10
      ]
    }
  },
  "chunk_key_encoding": {
    "name": "default",
    "configuration": {
      "separator": "/"
    }
  },
  "fill_value": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=",
  "codecs": [
    {
      "name": "bytes"
    },
    {
      "name": "zstd",
      "configuration": {
        "level": 0,
        "checksum": false
      }
    }
  ],
  "attributes": {},
  "zarr_format": 3,
  "node_type": "array",
  "storage_transformers": []
}

Steps to reproduce

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "zarr@git+https://github.com/zarr-developers/zarr-python.git@main",
# ]
# ///
#
# This script automatically imports the development branch of zarr to check for issues

import zarr
from zarr.storage import LocalStore
import numpy as np
# your reproducer code
# zarr.print_debug_info()

DTYPE = np.dtype([('a', 'i4'), ('b', 'f4', (5,5))])

store = LocalStore('bug.zarr')
arr = zarr.create_array(store, name='test', shape=(10,), dtype=DTYPE, fill_value=bytes(DTYPE.itemsize))

print('Original dtype:', DTYPE)
print('Array created with dtype:', arr.dtype)
print('Accessed item dtype: ', arr[0].dtype)

Additional output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugPotential issues with the zarr-python library

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions