Skip to content

Conversation

@brokkoli71
Copy link
Member

@brokkoli71 brokkoli71 commented Nov 28, 2025

Further Changes
  • adds store methods getSize and getInputStream
  • fixes S3 store get range
TODO:
  • Implement ZipStore
  • Test ZipStore (merge with MemoryStore tests)
  • buffered implementation

Requirements derived from https://ngff.openmicroscopy.org/rfc/9/index.html

  • ZIP64 format
  • Disable ZIP-level compression
  • Option to add archive comments in the ZIP file header
  • no use of nested or multi-part ZIP archives
  • "The root-level zarr.json file SHOULD be the first ZIP file entry and the first entry in the central directory header; other zarr.json files SHOULD follow immediately afterwards, in breadth-first order."
EDIT:
  • enforcing zip64 on smaller files resulted in header errors when writing with org.apache.commons.compress. For now zip64 will only be used when needed

@normanrz
Copy link
Member

I wonder how writing would be implemented on the ZipStore? Maybe it could have an internal MemoryStore that is used for buffering. And then, there could be a close or flush method that flushes the buffered contents into a zip file.

Reading could use a different implementation that works on the zip file directly. Maybe it even makes sense to have 2 separate ZipStore implementations.

In any case, it would be cool if the ZipStore(s) could wrap around any other store so that zip files can be written to disk, s3 etc.

@brokkoli71 brokkoli71 changed the title add ZipStore tests Zip Store Nov 30, 2025
@brokkoli71
Copy link
Member Author

I am thinking about the default behavior of writing to the buffered zip store. I think it might be confusing if calling array.write(...) does not actually write something to the zip file unless zipstore.flush() is called. Maybe per default flush should be called within every write, or there could be an optional argument when creating a buffered zip store like autoFlush or flushOnWrite that defaults to true but can be turned off.

@brokkoli71 brokkoli71 self-assigned this Dec 12, 2025
@normanrz
Copy link
Member

I think the name BufferedZipStore actually conveys pretty nicely that things are not written directly. But an autoFlush option that is false by default would be an interesting addition.

@Nullable
@Override
public ByteBuffer get(String[] keys, long start, long end) {
ByteBuffer buffer = underlyingStore.read();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ReadOnlyZipStore will read the entire ZIP file here, but only parses the required regions.
This could be changed so that it only reads the required bytes from the ZIP file.
One possibility would be to add store.getInputStream() and parse this stream instead. I think that would enable reading from large zip files, that don't fit into memory. Do you think this should be taken care of?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I think the primary purpose of the ReadOnlyZipStore is to read large ZIP files directly from the storage.

private final Store.ListableStore bufferStore;
private String archiveComment;

private void writeBuffer() throws IOException{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine for now, but it creates copies of the entire zip archive in memory, so it could be expensive. Maybe we could add a streaming-write option the the store interface in the future.

@Nullable
@Override
public ByteBuffer get(String[] keys, long start, long end) {
ByteBuffer buffer = underlyingStore.read();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I think the primary purpose of the ReadOnlyZipStore is to read large ZIP files directly from the storage.

brokkoli71 and others added 23 commits December 18, 2025 15:46
diff --git c/src/main/java/dev/zarr/zarrjava/store/ZipStore.java i/src/main/java/dev/zarr/zarrjava/store/ZipStore.java
new file mode 100644
index 0000000..054917f
--- /dev/null
+++ i/src/main/java/dev/zarr/zarrjava/store/ZipStore.java
@@ -0,0 +1,72 @@
+package dev.zarr.zarrjava.store;
+
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.stream.Stream;
+
+public class ZipStore implements Store, Store.ListableStore {
+    @nonnull
+    private final Path path;
+
+    public ZipStore(@nonnull Path path) {
+        this.path = path;
+    }
+
+    public ZipStore(@nonnull String path) {
+        this.path = Paths.get(path);
+    }
+
+
+    @OverRide
+    public Stream<String> list(String[] keys) {
+        return Stream.empty();
+    }
+
+    @OverRide
+    public boolean exists(String[] keys) {
+        return false;
+    }
+
+    @nullable
+    @OverRide
+    public ByteBuffer get(String[] keys) {
+        return null;
+    }
+
+    @nullable
+    @OverRide
+    public ByteBuffer get(String[] keys, long start) {
+        return null;
+    }
+
+    @nullable
+    @OverRide
+    public ByteBuffer get(String[] keys, long start, long end) {
+        return null;
+    }
+
+    @OverRide
+    public void set(String[] keys, ByteBuffer bytes) {
+
+    }
+
+    @OverRide
+    public void delete(String[] keys) {
+
+    }
+
+    @nonnull
+    @OverRide
+    public StoreHandle resolve(String... keys) {
+        return new StoreHandle(this, keys);
+    }
+
+    @OverRide
+    public String toString() {
+        return this.path.toUri().toString().replaceAll("\\/$", "");
+    }
+
+}
diff --git c/src/test/java/dev/zarr/zarrjava/Utils.java i/src/test/java/dev/zarr/zarrjava/Utils.java
new file mode 100644
index 0000000..0026200
--- /dev/null
+++ i/src/test/java/dev/zarr/zarrjava/Utils.java
@@ -0,0 +1,40 @@
+package dev.zarr.zarrjava;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.util.zip.ZipEntry;
+import java.util.zip.ZipOutputStream;
+
+public class Utils {
+
+    static void zipFile(File fileToZip, String fileName, ZipOutputStream zipOut) throws IOException {
+        if (fileToZip.isHidden()) {
+            return;
+        }
+        if (fileToZip.isDirectory()) {
+            if (fileName.endsWith("/")) {
+                zipOut.putNextEntry(new ZipEntry(fileName));
+                zipOut.closeEntry();
+            } else {
+                zipOut.putNextEntry(new ZipEntry(fileName + "/"));
+                zipOut.closeEntry();
+            }
+            File[] children = fileToZip.listFiles();
+            for (File childFile : children) {
+                zipFile(childFile, fileName + "/" + childFile.getName(), zipOut);
+            }
+            return;
+        }
+        FileInputStream fis = new FileInputStream(fileToZip);
+        ZipEntry zipEntry = new ZipEntry(fileName);
+        zipOut.putNextEntry(zipEntry);
+        byte[] bytes = new byte[1024];
+        int length;
+        while ((length = fis.read(bytes)) >= 0) {
+            zipOut.write(bytes, 0, length);
+        }
+        fis.close();
+    }
+
+}
diff --git c/src/test/java/dev/zarr/zarrjava/ZarrStoreTest.java i/src/test/java/dev/zarr/zarrjava/ZarrStoreTest.java
index 4a369c9..c7d2ab4 100644
--- c/src/test/java/dev/zarr/zarrjava/ZarrStoreTest.java
+++ i/src/test/java/dev/zarr/zarrjava/ZarrStoreTest.java
@@ -7,16 +7,22 @@ import dev.zarr.zarrjava.v3.*;
 import org.junit.jupiter.api.Assertions;
 import org.junit.jupiter.api.Test;
 import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.MethodSource;
 import org.junit.jupiter.params.provider.CsvSource;
 import software.amazon.awssdk.auth.credentials.AnonymousCredentialsProvider;
 import software.amazon.awssdk.regions.Region;
 import software.amazon.awssdk.services.s3.S3Client;

+import java.io.File;
+import java.io.FileOutputStream;
 import java.io.IOException;
 import java.nio.file.Files;
 import java.util.Arrays;
 import java.util.stream.Stream;
+import java.nio.file.Path;
+import java.util.zip.ZipOutputStream;

+import static dev.zarr.zarrjava.Utils.zipFile;
 import static dev.zarr.zarrjava.v3.Node.makeObjectMapper;

 public class ZarrStoreTest extends ZarrTest {
@@ -132,4 +138,72 @@ public class ZarrStoreTest extends ZarrTest {
         Assertions.assertEquals("test group", attrs.getString("description"));

     }
+
+    @test
+    public void testZipStore() throws ZarrException, IOException {
+        Path sourceDir = TESTOUTPUT.resolve("testZipStore");
+        Path targetDir = TESTOUTPUT.resolve("testZipStore.zip");
+        FilesystemStore fsStore = new FilesystemStore(sourceDir);
+        writeTestGroupV3(fsStore, true);
+
+        FileOutputStream fos = new FileOutputStream(targetDir.toFile());
+        ZipOutputStream zipOut = new ZipOutputStream(fos);
+
+        File fileToZip = new File(sourceDir.toUri());
+        zipFile(fileToZip, fileToZip.getName(), zipOut);
+        zipOut.close();
+        fos.close();
+
+        ZipStore zipStore = new ZipStore(targetDir);
+        assertIsTestGroupV3(Group.open(zipStore.resolve()), true);
+    }
+
+    static Stream<Store> localStores() {
+        return Stream.of(
+//                new ConcurrentMemoryStore(),
+                new FilesystemStore(TESTOUTPUT.resolve("testLocalStoresFS")),
+                new ZipStore(TESTOUTPUT.resolve("testLocalStoresZIP.zip"))
+        );
+    }
+
+    @ParameterizedTest
+    @MethodSource("localStores")
+    public void testLocalStores(Store store) throws IOException, ZarrException {
+        boolean useParallel = true;
+        Group group = writeTestGroupV3(store, useParallel);
+        assertIsTestGroupV3(group, useParallel);
+    }
+
+    int[] testData(){
+        int[] testData = new int[1024 * 1024];
+        Arrays.setAll(testData, p -> p);
+        return testData;
+    }
+
+    Group writeTestGroupV3(Store store, boolean useParallel) throws ZarrException, IOException {
+        StoreHandle storeHandle = store.resolve();
+
+        Group group = Group.create(storeHandle);
+        Array array = group.createArray("array", b -> b
+                .withShape(1024, 1024)
+                .withDataType(DataType.UINT32)
+                .withChunkShape(5, 5)
+        );
+        array.write(ucar.ma2.Array.factory(ucar.ma2.DataType.UINT, new int[]{1024, 1024}, testData()), useParallel);
+        group.createGroup("subgroup");
+        group.setAttributes(new Attributes(b -> b.set("some", "value")));
+        return group;
+    }
+
+    void assertIsTestGroupV3(Group group, boolean useParallel) throws ZarrException {
+        Stream<dev.zarr.zarrjava.core.Node> nodes = group.list();
+        Assertions.assertEquals(2, nodes.count());
+        Array array = (Array) group.get("array");
+        Assertions.assertNotNull(array);
+        ucar.ma2.Array result = array.read(useParallel);
+        Assertions.assertArrayEquals(testData(), (int[]) result.get1DJavaArray(ucar.ma2.DataType.UINT));
+        Attributes attrs = group.metadata().attributes;
+        Assertions.assertNotNull(attrs);
+        Assertions.assertEquals("value", attrs.getString("some"));
+    }
 }
There are apparently cases where release: [created]
leads to the deploy not being triggered. Attempting
an expansion to [created, published].

diff --git c/pom.xml i/pom.xml
index f4c1091..9c9d45d 100644
--- c/pom.xml
+++ i/pom.xml
@@ -6,7 +6,7 @@

     <groupId>dev.zarr</groupId>
     <artifactId>zarr-java</artifactId>
-    <version>0.0.9</version>
+    <version>0.0.6</version>

     <name>zarr-java</name>

@@ -123,6 +123,17 @@
         </dependency>
     </dependencies>

+    <distributionManagement>
+        <snapshotRepository>
+            <id>ossrh</id>
+            <url>https://s01.oss.sonatype.org/content/repositories/snapshots</url>
+        </snapshotRepository>
+        <repository>
+            <id>ossrh</id>
+            <url>https://s01.oss.sonatype.org/service/local/staging/deploy/maven2/</url>
+        </repository>
+    </distributionManagement>
+
     <repositories>
         <repository>
             <id>unidata-all</id>
@@ -221,16 +232,6 @@
                     </execution>
                 </executions>
             </plugin>
-            <plugin>
-                <groupId>org.sonatype.central</groupId>
-                <artifactId>central-publishing-maven-plugin</artifactId>
-                <version>0.8.0</version>
-                <extensions>true</extensions>
-                <configuration>
-                    <publishingServerId>central</publishingServerId>
-                    <tokenAuth>true</tokenAuth>
-                </configuration>
-            </plugin>
         </plugins>
     </build>
 </project>
Use new sonatype plugin for upload
1. leading slashes in paths
2. no sizes in entry headers
@brokkoli71 brokkoli71 marked this pull request as ready for review December 18, 2025 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ZipStore

4 participants