Skip to content

cat_ranges doesn't pass kwargs to cat_file which generates block size mismatches which result in no cache usage #2016

@bnlawrence

Description

@bnlawrence

I'm trying to use fsspec with non-default block-sizes for a local disk cache and access to remote s3/https resources. It seems that when cat_ranges is not used, all works as expected, but when usingcat_ranges, the cache is bypassed, as cat_ranges always goes out to the network.

If I've understood the problem correctly, AbstractFileSystem.cat_ranges accepts **kwargs but does not forward them when calling cat_file:

fsspec/spec.py:

 out.append(self.cat_file(p, s, e))   # <-- kwargs silently dropped

The consequence is that when CachingFileSystem._open uses block_size to look up blocks in its
on-disk cache and block_size is missing the call falls back to the default value, which (in my case) does not match the block size used to populate the cache, so every cat_ranges call fetches from the underlying (remote) filesystem instead of reading locally.

Solution

Locally patching AbstractFileSystem.cat_ranges to call self.cat_file(p, s, e, **kwargs) restores the expected cache reuse in our real integration case. I'm not brave enough to deliver a pull request with all your testing environment, but I hope that'd be trivial for you to check.

Environment

  • Python: 3.12.13 (conda-forge, macOS arm64)
  • fsspec: 2026.3.0
  • s3fs: 2026.3.0

MWE

I've struggled to demonstrate this in a useful MWE, but in the spirit of trying to provide something, I got my LLM to generate the following which at least demonstrates what is obvious from the line of code above, that the kwargs are dropped ...

from __future__ import annotations
from fsspec.spec import AbstractFileSystem

class _SpyFS(AbstractFileSystem):
    """Minimal AbstractFileSystem that records kwargs received by cat_file."""

    protocol = "spy"

    def __init__(self):
        # Skip AbstractFileSystem.__init__ to avoid registration side-effects.
        self._kwargs_received: dict | None = None

    def cat_file(self, path, start=None, end=None, **kwargs):
        self._kwargs_received = kwargs
        return b"data"

def main() -> None:
    fs = _SpyFS()

    # Call cat_ranges with an explicit block_size kwarg.
    block_size = 2 * 1024 * 1024
    fs.cat_ranges(["dummy"], [0], [10], block_size=block_size)

    received = fs._kwargs_received
    print(f"kwargs passed to cat_file: {received}")

if __name__ == "__main__":
    main()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions