I'm trying to use fsspec with non-default block-sizes for a local disk cache and access to remote s3/https resources. It seems that when cat_ranges is not used, all works as expected, but when usingcat_ranges, the cache is bypassed, as cat_ranges always goes out to the network.
If I've understood the problem correctly, AbstractFileSystem.cat_ranges accepts **kwargs but does not forward them when calling cat_file:
fsspec/spec.py:
out.append(self.cat_file(p, s, e)) # <-- kwargs silently dropped
The consequence is that when CachingFileSystem._open uses block_size to look up blocks in its
on-disk cache and block_size is missing the call falls back to the default value, which (in my case) does not match the block size used to populate the cache, so every cat_ranges call fetches from the underlying (remote) filesystem instead of reading locally.
Solution
Locally patching AbstractFileSystem.cat_ranges to call self.cat_file(p, s, e, **kwargs) restores the expected cache reuse in our real integration case. I'm not brave enough to deliver a pull request with all your testing environment, but I hope that'd be trivial for you to check.
Environment
- Python: 3.12.13 (conda-forge, macOS arm64)
- fsspec: 2026.3.0
- s3fs: 2026.3.0
MWE
I've struggled to demonstrate this in a useful MWE, but in the spirit of trying to provide something, I got my LLM to generate the following which at least demonstrates what is obvious from the line of code above, that the kwargs are dropped ...
from __future__ import annotations
from fsspec.spec import AbstractFileSystem
class _SpyFS(AbstractFileSystem):
"""Minimal AbstractFileSystem that records kwargs received by cat_file."""
protocol = "spy"
def __init__(self):
# Skip AbstractFileSystem.__init__ to avoid registration side-effects.
self._kwargs_received: dict | None = None
def cat_file(self, path, start=None, end=None, **kwargs):
self._kwargs_received = kwargs
return b"data"
def main() -> None:
fs = _SpyFS()
# Call cat_ranges with an explicit block_size kwarg.
block_size = 2 * 1024 * 1024
fs.cat_ranges(["dummy"], [0], [10], block_size=block_size)
received = fs._kwargs_received
print(f"kwargs passed to cat_file: {received}")
if __name__ == "__main__":
main()
I'm trying to use fsspec with non-default block-sizes for a local disk cache and access to remote s3/https resources. It seems that when
cat_rangesis not used, all works as expected, but when usingcat_ranges, the cache is bypassed, as cat_ranges always goes out to the network.If I've understood the problem correctly,
AbstractFileSystem.cat_rangesaccepts**kwargsbut does not forward them when callingcat_file:fsspec/spec.py:
The consequence is that when
CachingFileSystem._openusesblock_sizeto look up blocks in itson-disk cache and
block_sizeis missing the call falls back to the default value, which (in my case) does not match the block size used to populate the cache, so everycat_rangescall fetches from the underlying (remote) filesystem instead of reading locally.Solution
Locally patching
AbstractFileSystem.cat_rangesto callself.cat_file(p, s, e, **kwargs)restores the expected cache reuse in our real integration case. I'm not brave enough to deliver a pull request with all your testing environment, but I hope that'd be trivial for you to check.Environment
MWE
I've struggled to demonstrate this in a useful MWE, but in the spirit of trying to provide something, I got my LLM to generate the following which at least demonstrates what is obvious from the line of code above, that the kwargs are dropped ...