add size property to enable _cp_file#742
Conversation
|
@martindurant in order to get the size here, needed by |
|
How about #745 as an alternative without the async attribute? In that case, the size is only available after the first read, but requires no extra call. |
Wouldn't the workflow of checking size and then transferring data not work? https://github.com/fsspec/filesystem_spec/blob/386a084ffb7f8194265056e19f53ffd252a89e20/fsspec/generic.py#L283 |
|
You are right about that ... However, the rsync idea you mention is already implemented in https://github.com/fsspec/filesystem_spec/blob/386a084ffb7f8194265056e19f53ffd252a89e20/fsspec/generic.py#L36 and includes getting all the info for all the files ahead of time. For s3, calling find once will be much faster that calling info on each file, even if done asynchronously. |
That makes sense. Looking at the |
|
OK, I did that. |
|
closing in favor of #745 |
fsspec's generic filesystem
_cp_filefunctionality checks the size of the file before reading it, causing the currently s3 async implementation to fail. This PR calls_infoto retrieve and cache the file size when needed. Once this bug in fsspec is fixed (fsspec/filesystem_spec#1281) it should be possible to use the generic filesystem cp functionality with s3fs.