Skip to content

ArrowFSWrapper with S3 does not like leading slash #2015

@TLCFEM

Description

@TLCFEM

Issue

This line defines the root marker as /.

Then in the parent class, the _parent method would always return a string with leading slash.

filesystem_spec/fsspec/spec.py

Lines 1266 to 1272 in 2b5ed0f

def _parent(cls, path):
path = cls._strip_protocol(path)
if "/" in path:
parent = path.rsplit("/", 1)[0].lstrip(cls.root_marker)
return cls.root_marker + parent
else:
return cls.root_marker

However, when used with a S3 such as ArrowFSWrapper(pyarrow.fs.S3FileSystem()), pyarrow does not like leading slashes.

../.venv/lib/python3.12/site-packages/fsspec/spec.py:1106: in put
    self.put_file(lpath, rpath, callback=child, **kwargs)
../.venv/lib/python3.12/site-packages/fsspec/spec.py:1031: in put_file
    self.mkdirs(self._parent(os.fspath(rpath)), exist_ok=True)
../.venv/lib/python3.12/site-packages/fsspec/spec.py:1773: in mkdirs
    return self.makedirs(path, exist_ok=exist_ok)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
../.venv/lib/python3.12/site-packages/fsspec/implementations/arrow.py:23: in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
../.venv/lib/python3.12/site-packages/fsspec/implementations/arrow.py:195: in makedirs
    self.fs.create_dir(path, recursive=True)
pyarrow/_fs.pyx:638: in pyarrow._fs.FileSystem.create_dir
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   pyarrow.lib.ArrowInvalid: Path cannot start with a separator ('/test-bucket-dcf315a3aff54e8db4eb2bf8e9e85f5b/path')

pyarrow/error.pxi:92: ArrowInvalid

If the root marker is reverted using something as follows, pyarrow is happy.

class Wrapper(ArrowFSWrapper):
        root_marker = ""

Wrapper(pyarrow.fs.S3FileSystem())

Potential Fix

Since it is a class member, probably it is necessary to have a separate wrapper for S3.

class ArrowS3FSWrapper(ArrowFSWrapper):
        root_marker = ""

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions