-
Notifications
You must be signed in to change notification settings - Fork 17
Zip Store #37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Zip Store #37
Conversation
|
I wonder how writing would be implemented on the ZipStore? Maybe it could have an internal Reading could use a different implementation that works on the zip file directly. Maybe it even makes sense to have 2 separate ZipStore implementations. In any case, it would be cool if the ZipStore(s) could wrap around any other store so that zip files can be written to disk, s3 etc. |
|
I am thinking about the default behavior of writing to the buffered zip store. I think it might be confusing if calling |
|
I think the name |
| @Nullable | ||
| @Override | ||
| public ByteBuffer get(String[] keys, long start, long end) { | ||
| ByteBuffer buffer = underlyingStore.read(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ReadOnlyZipStore will read the entire ZIP file here, but only parses the required regions.
This could be changed so that it only reads the required bytes from the ZIP file.
One possibility would be to add store.getInputStream() and parse this stream instead. I think that would enable reading from large zip files, that don't fit into memory. Do you think this should be taken care of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I think the primary purpose of the ReadOnlyZipStore is to read large ZIP files directly from the storage.
| private final Store.ListableStore bufferStore; | ||
| private String archiveComment; | ||
|
|
||
| private void writeBuffer() throws IOException{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine for now, but it creates copies of the entire zip archive in memory, so it could be expensive. Maybe we could add a streaming-write option the the store interface in the future.
| @Nullable | ||
| @Override | ||
| public ByteBuffer get(String[] keys, long start, long end) { | ||
| ByteBuffer buffer = underlyingStore.read(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I think the primary purpose of the ReadOnlyZipStore is to read large ZIP files directly from the storage.
diff --git c/src/main/java/dev/zarr/zarrjava/store/ZipStore.java i/src/main/java/dev/zarr/zarrjava/store/ZipStore.java new file mode 100644 index 0000000..054917f --- /dev/null +++ i/src/main/java/dev/zarr/zarrjava/store/ZipStore.java @@ -0,0 +1,72 @@ +package dev.zarr.zarrjava.store; + +import javax.annotation.Nonnull; +import javax.annotation.Nullable; +import java.nio.ByteBuffer; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.stream.Stream; + +public class ZipStore implements Store, Store.ListableStore { + @nonnull + private final Path path; + + public ZipStore(@nonnull Path path) { + this.path = path; + } + + public ZipStore(@nonnull String path) { + this.path = Paths.get(path); + } + + + @OverRide + public Stream<String> list(String[] keys) { + return Stream.empty(); + } + + @OverRide + public boolean exists(String[] keys) { + return false; + } + + @nullable + @OverRide + public ByteBuffer get(String[] keys) { + return null; + } + + @nullable + @OverRide + public ByteBuffer get(String[] keys, long start) { + return null; + } + + @nullable + @OverRide + public ByteBuffer get(String[] keys, long start, long end) { + return null; + } + + @OverRide + public void set(String[] keys, ByteBuffer bytes) { + + } + + @OverRide + public void delete(String[] keys) { + + } + + @nonnull + @OverRide + public StoreHandle resolve(String... keys) { + return new StoreHandle(this, keys); + } + + @OverRide + public String toString() { + return this.path.toUri().toString().replaceAll("\\/$", ""); + } + +} diff --git c/src/test/java/dev/zarr/zarrjava/Utils.java i/src/test/java/dev/zarr/zarrjava/Utils.java new file mode 100644 index 0000000..0026200 --- /dev/null +++ i/src/test/java/dev/zarr/zarrjava/Utils.java @@ -0,0 +1,40 @@ +package dev.zarr.zarrjava; + +import java.io.File; +import java.io.FileInputStream; +import java.io.IOException; +import java.util.zip.ZipEntry; +import java.util.zip.ZipOutputStream; + +public class Utils { + + static void zipFile(File fileToZip, String fileName, ZipOutputStream zipOut) throws IOException { + if (fileToZip.isHidden()) { + return; + } + if (fileToZip.isDirectory()) { + if (fileName.endsWith("/")) { + zipOut.putNextEntry(new ZipEntry(fileName)); + zipOut.closeEntry(); + } else { + zipOut.putNextEntry(new ZipEntry(fileName + "/")); + zipOut.closeEntry(); + } + File[] children = fileToZip.listFiles(); + for (File childFile : children) { + zipFile(childFile, fileName + "/" + childFile.getName(), zipOut); + } + return; + } + FileInputStream fis = new FileInputStream(fileToZip); + ZipEntry zipEntry = new ZipEntry(fileName); + zipOut.putNextEntry(zipEntry); + byte[] bytes = new byte[1024]; + int length; + while ((length = fis.read(bytes)) >= 0) { + zipOut.write(bytes, 0, length); + } + fis.close(); + } + +} diff --git c/src/test/java/dev/zarr/zarrjava/ZarrStoreTest.java i/src/test/java/dev/zarr/zarrjava/ZarrStoreTest.java index 4a369c9..c7d2ab4 100644 --- c/src/test/java/dev/zarr/zarrjava/ZarrStoreTest.java +++ i/src/test/java/dev/zarr/zarrjava/ZarrStoreTest.java @@ -7,16 +7,22 @@ import dev.zarr.zarrjava.v3.*; import org.junit.jupiter.api.Assertions; import org.junit.jupiter.api.Test; import org.junit.jupiter.params.ParameterizedTest; +import org.junit.jupiter.params.provider.MethodSource; import org.junit.jupiter.params.provider.CsvSource; import software.amazon.awssdk.auth.credentials.AnonymousCredentialsProvider; import software.amazon.awssdk.regions.Region; import software.amazon.awssdk.services.s3.S3Client; +import java.io.File; +import java.io.FileOutputStream; import java.io.IOException; import java.nio.file.Files; import java.util.Arrays; import java.util.stream.Stream; +import java.nio.file.Path; +import java.util.zip.ZipOutputStream; +import static dev.zarr.zarrjava.Utils.zipFile; import static dev.zarr.zarrjava.v3.Node.makeObjectMapper; public class ZarrStoreTest extends ZarrTest { @@ -132,4 +138,72 @@ public class ZarrStoreTest extends ZarrTest { Assertions.assertEquals("test group", attrs.getString("description")); } + + @test + public void testZipStore() throws ZarrException, IOException { + Path sourceDir = TESTOUTPUT.resolve("testZipStore"); + Path targetDir = TESTOUTPUT.resolve("testZipStore.zip"); + FilesystemStore fsStore = new FilesystemStore(sourceDir); + writeTestGroupV3(fsStore, true); + + FileOutputStream fos = new FileOutputStream(targetDir.toFile()); + ZipOutputStream zipOut = new ZipOutputStream(fos); + + File fileToZip = new File(sourceDir.toUri()); + zipFile(fileToZip, fileToZip.getName(), zipOut); + zipOut.close(); + fos.close(); + + ZipStore zipStore = new ZipStore(targetDir); + assertIsTestGroupV3(Group.open(zipStore.resolve()), true); + } + + static Stream<Store> localStores() { + return Stream.of( +// new ConcurrentMemoryStore(), + new FilesystemStore(TESTOUTPUT.resolve("testLocalStoresFS")), + new ZipStore(TESTOUTPUT.resolve("testLocalStoresZIP.zip")) + ); + } + + @ParameterizedTest + @MethodSource("localStores") + public void testLocalStores(Store store) throws IOException, ZarrException { + boolean useParallel = true; + Group group = writeTestGroupV3(store, useParallel); + assertIsTestGroupV3(group, useParallel); + } + + int[] testData(){ + int[] testData = new int[1024 * 1024]; + Arrays.setAll(testData, p -> p); + return testData; + } + + Group writeTestGroupV3(Store store, boolean useParallel) throws ZarrException, IOException { + StoreHandle storeHandle = store.resolve(); + + Group group = Group.create(storeHandle); + Array array = group.createArray("array", b -> b + .withShape(1024, 1024) + .withDataType(DataType.UINT32) + .withChunkShape(5, 5) + ); + array.write(ucar.ma2.Array.factory(ucar.ma2.DataType.UINT, new int[]{1024, 1024}, testData()), useParallel); + group.createGroup("subgroup"); + group.setAttributes(new Attributes(b -> b.set("some", "value"))); + return group; + } + + void assertIsTestGroupV3(Group group, boolean useParallel) throws ZarrException { + Stream<dev.zarr.zarrjava.core.Node> nodes = group.list(); + Assertions.assertEquals(2, nodes.count()); + Array array = (Array) group.get("array"); + Assertions.assertNotNull(array); + ucar.ma2.Array result = array.read(useParallel); + Assertions.assertArrayEquals(testData(), (int[]) result.get1DJavaArray(ucar.ma2.DataType.UINT)); + Attributes attrs = group.metadata().attributes; + Assertions.assertNotNull(attrs); + Assertions.assertEquals("value", attrs.getString("some")); + } }
There are apparently cases where release: [created] leads to the deploy not being triggered. Attempting an expansion to [created, published]. diff --git c/pom.xml i/pom.xml index f4c1091..9c9d45d 100644 --- c/pom.xml +++ i/pom.xml @@ -6,7 +6,7 @@ <groupId>dev.zarr</groupId> <artifactId>zarr-java</artifactId> - <version>0.0.9</version> + <version>0.0.6</version> <name>zarr-java</name> @@ -123,6 +123,17 @@ </dependency> </dependencies> + <distributionManagement> + <snapshotRepository> + <id>ossrh</id> + <url>https://s01.oss.sonatype.org/content/repositories/snapshots</url> + </snapshotRepository> + <repository> + <id>ossrh</id> + <url>https://s01.oss.sonatype.org/service/local/staging/deploy/maven2/</url> + </repository> + </distributionManagement> + <repositories> <repository> <id>unidata-all</id> @@ -221,16 +232,6 @@ </execution> </executions> </plugin> - <plugin> - <groupId>org.sonatype.central</groupId> - <artifactId>central-publishing-maven-plugin</artifactId> - <version>0.8.0</version> - <extensions>true</extensions> - <configuration> - <publishingServerId>central</publishingServerId> - <tokenAuth>true</tokenAuth> - </configuration> - </plugin> </plugins> </build> </project>
Use new sonatype plugin for upload
…nstead of own implementation
1. leading slashes in paths 2. no sizes in entry headers
40de1be to
086d3f8
Compare
Further Changes
storemethodsgetSizeandgetInputStreamTODO:
Requirements derived from https://ngff.openmicroscopy.org/rfc/9/index.html
EDIT:
org.apache.commons.compress. For now zip64 will only be used when needed