Skip to content
This repository was archived by the owner on Nov 19, 2024. It is now read-only.
This repository was archived by the owner on Nov 19, 2024. It is now read-only.

File.Open given to the handler for Tar.Extract can't be used after the handler has returned #371

@ncw

Description

@ncw

What would you like to have changed?

Currently when you call Extract on a .zip file the handler method is given File objects with Open methods which can be called some time in the future, after the handler has returned.

This is because the zip reader is upgraded into an io.ReaderAt so the files can be seeked and opened later.

However if you do the same with Tar, the File.Open methods are only valid as long as the handler hasn't returned. If you call them after the handler has returned they return 0 bytes.

One solution would be to run Extract on the archive again with just the file name required. This would be an O(n^2) solution though :-( In rclone's case this would mean reading the entire tar file off remote storage in entirety for each file extracted which is super inefficient. Rclone could use local caching for this which would help somewhat.

For plain .tar files, they should be quite seekable in theory, but I don't think this is something that we can get from the standard library. For compressed .tar.gz for example one would need to keep the state of the decompressor at the start of each file in order to seek which is starting to sound very complicated. You can create seekable gzip files (rclone does this in its compress backend) but that is complicated and non-standard.

I think for rclone's purposes compressed tar files would mean you've got to read the entire file just to see the directory entries - I don't think there is any way around that. Where in theory at least you should be able to skip the actual file data when scanning an uncompressed .tar. I looked through the source of archive/tar and if the reader is capable of seeking then it uses it to skip data where necessary so that is good.

I think reading the seek position of the underlying reader each file in an uncompressed .tar could be interesting but that would need a fork of archive/tar.

Any ideas?

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions