Skip to content

fix: Support OME-TIFF page-indexed reading for live viewing#36

Merged
hongquanli merged 19 commits intomainfrom
fix/ome-tiff-page-indexed-reading
Mar 25, 2026
Merged

fix: Support OME-TIFF page-indexed reading for live viewing#36
hongquanli merged 19 commits intomainfrom
fix/ome-tiff-page-indexed-reading

Conversation

@hongquanli
Copy link
Copy Markdown
Collaborator

Summary

  • _load_single_plane() previously always read page 0, which is wrong for OME-TIFF stacks that store multiple planes (T, Z, C) as separate pages in one file
  • Adds page_idx parameter to register_image() so callers can specify which page to read — backward compatible (defaults to 0)
  • Adds IndexError handling for pages not yet written during live acquisition
  • Supersedes fix: support page-indexed TIFF reading for OME-TIFF live viewing #35 with a cleaner design: uses a typed parameter instead of # string encoding, consistent with how zarr handles multi-plane access

Test plan

  • Run python main_hcs.py --simulation with OME_TIFF save format and ENABLE_NDVIEWER=True
  • Verify existing single-TIFF workflows are unaffected (page_idx defaults to 0)

🤖 Generated with Claude Code

maragall and others added 8 commits March 24, 2026 13:04
_load_single_plane() previously always read page 0 from TIFF files.
OME-TIFF stacks store multiple planes (T, Z, C) as separate pages
in a single file. This adds support for a page index suffix in the
filepath (e.g., "path.ome.tiff#5") so the caller can specify which
page to read. Individual TIFF files are unaffected (no suffix = page 0).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Validate page index before use (catch ValueError from int())
- Catch IndexError for out-of-range pages (expected during live writing)
- Log actual_path (without #suffix) in error messages for clarity
- Include page_idx in generic error log for easier debugging
- Document #page_idx convention in register_image() docstring

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the string-encoding approach (filepath#page_idx) with a proper
page_idx parameter on register_image(). This is type-safe, avoids edge
cases with # in filenames, and is consistent with how zarr handles
multi-plane access via direct indexing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use concrete Tuple types instead of Dict[tuple, tuple]
- Change `if not entry` to `if entry is None` for clarity
- Update _PlaneLoader test helper and test data to use (filepath, page_idx) tuples

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
For OME-TIFF files with many pages (channels x z-slices), each plane
read was opening the file and re-parsing the entire IFD chain. Now
TiffFile handles are cached and reused, matching the pattern used by
the zarr path (_zarr_fov_stores). Handles are closed on acquisition
start and widget close.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- page_idx==0: open/read/close like before (no handle leak for single-page files)
- page_idx>0: cache handles with double-checked locking (lock-free read
  for cache hits, lock only when creating new handles)
- Eliminates race condition where handle could be closed while in use
  by another thread (single-page path doesn't cache; multi-page path
  only evicts on acquisition reset when no workers are active)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace lock-free dict.get() with always-locked access (no CPython
  GIL dependency, negligible perf difference vs page read I/O)
- Add test_load_single_plane_loads_specific_page: writes a 3-page TIFF
  and verifies each page is read correctly by page_idx

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove page_idx==0 special case — always cache TiffFile handles
  (negligible overhead for single-page files, consistent behavior)
- Evict cached handle on IndexError so next read re-parses the file
  and picks up newly appended pages during live acquisition
- Close handle cache in end_acquisition() to release file descriptors
  when acquisition completes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates NDViewer Light’s push-based live acquisition path to correctly read arbitrary pages from multi-page OME-TIFF files (instead of always reading page 0), enabling correct plane selection during live viewing.

Changes:

  • Extend register_image() to accept a page_idx (default 0) and store (filepath, page_idx) in the push-mode file index.
  • Update _load_single_plane() to read the requested TIFF page and add a cached TiffFile handle map for multi-page performance.
  • Update unit tests to reflect (filepath, page_idx) indexing and add coverage for reading specific pages from a multi-page TIFF.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
ndviewer_light/core.py Adds page-indexed loading, register_image(..., page_idx=0), and cached TiffFile handles with IndexError handling for not-yet-written pages.
tests/test_push_api.py Updates the _PlaneLoader test helper to store (filepath, page_idx) and adds a new test for reading specific pages from a multi-page TIFF.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ndviewer_light/core.py
# Evict stale handle so next attempt re-parses the file
# (the writer may have appended new pages since we opened it).
with self._tiff_handles_lock:
self._tiff_handles.pop(filepath, None)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When handling an IndexError, the cached TiffFile handle is evicted via pop() but never closed. This can leak file descriptors over time if pages are requested before they exist (common during live acquisition). Capture the popped handle and close it (ideally via _close_tiff_handles([handle])) before discarding it.

Suggested change
self._tiff_handles.pop(filepath, None)
tif_handle = self._tiff_handles.pop(filepath, None)
if tif_handle is not None:
# Ensure the evicted TIFF handle is properly closed to avoid
# leaking file descriptors during live acquisition.
self._close_tiff_handles([tif_handle])

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code] Fixed in 1fa8299 - evicted handle is now captured and closed via _close_tiff_handles() before discarding.

Comment thread ndviewer_light/core.py Outdated
Comment on lines +2031 to +2036
with self._tiff_handles_lock:
tif = self._tiff_handles.get(filepath)
if tif is None:
tif = tf.TiffFile(filepath)
self._tiff_handles[filepath] = tif
plane = tif.pages[page_idx].asarray()
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_load_single_plane caches tf.TiffFile objects and then reads tif.pages[page_idx].asarray() outside of any lock. Since these loads are executed via dask (default threaded scheduler), multiple worker threads can call this concurrently and share the same TiffFile handle, which can lead to race conditions during file seeks/reads. Consider either (a) guarding the page read with a per-file lock, or (b) avoiding shared handles across threads (open per read, or cache per-thread).

Suggested change
with self._tiff_handles_lock:
tif = self._tiff_handles.get(filepath)
if tif is None:
tif = tf.TiffFile(filepath)
self._tiff_handles[filepath] = tif
plane = tif.pages[page_idx].asarray()
# The same TiffFile handle may be shared across dask worker threads,
# so both handle lookup/creation *and* page reads must be done
# under the same lock to avoid concurrent access to the file object.
with self._tiff_handles_lock:
tif = self._tiff_handles.get(filepath)
if tif is None:
tif = tf.TiffFile(filepath)
self._tiff_handles[filepath] = tif
plane = tif.pages[page_idx].asarray()

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code] Fixed in 1fa8299 - moved .asarray() inside _tiff_handles_lock so both handle lookup and page read are under the same lock, preventing concurrent access from dask worker threads.

Comment thread ndviewer_light/core.py
Comment on lines +2041 to +2050
except IndexError:
# Evict stale handle so next attempt re-parses the file
# (the writer may have appended new pages since we opened it).
with self._tiff_handles_lock:
self._tiff_handles.pop(filepath, None)
logger.warning(
"Page %d not available in %s (file may still be writing)",
page_idx,
filepath,
)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new page-index behavior (IndexError path for pages not yet written, and handle eviction) isn’t covered by tests against the real LightweightViewer._load_single_plane. Consider adding a unit test that registers an image with an out-of-range page_idx and asserts it returns zeros and closes/evicts any cached handle, so regressions don’t silently reintroduce stale-handle behavior.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code] Fixed in 1fa8299 - added test_load_single_plane_returns_zeros_on_out_of_range_page that writes a single-page TIFF, requests an out-of-range page, and asserts zeros are returned.

- Close stale TiffFile handle on IndexError eviction (fd leak fix)
- Move .asarray() inside _tiff_handles_lock to prevent concurrent
  access to the same TiffFile handle from dask worker threads
- Add test_load_single_plane_returns_zeros_on_out_of_range_page

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ndviewer_light/core.py Outdated
Comment on lines +1448 to +1451
# Cache open TiffFile handles to avoid re-parsing IFD chains for
# multi-page OME-TIFFs where many planes share the same file.
self._tiff_handles: Dict[str, Any] = {}
self._tiff_handles_lock = threading.Lock()
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_tiff_handles caches a TiffFile per unique filepath but has no size bound/eviction during acquisition. In workflows that write one TIFF per plane (many unique filepaths), this can keep thousands of files open until end_acquisition()/closeEvent(), risking file-descriptor exhaustion and (on some platforms) interfering with file deletion/renames. Consider only caching handles when multi-page access is actually needed (e.g., page_idx != 0 and/or .ome.tif(f)), or implement an LRU/size-limited cache that closes evicted handles.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code] Fixed in 06eb243 - only cache TiffFile handles when page_idx > 0 (multi-page OME-TIFF). Single-TIFF-per-plane files use open/read/close to avoid accumulating thousands of file descriptors.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code] Fixed in 1224f24 - added LRU eviction with max 128 cached handles. Uses OrderedDict with move_to_end for LRU tracking. Evicted handles are closed properly. Single-page files (page_idx==0) bypass the cache entirely.

hongquanli and others added 4 commits March 24, 2026 22:58
Single-TIFF-per-plane workflows can produce tens of thousands of files.
Caching all handles would exhaust OS file descriptor limits. Only cache
when page_idx > 0 (multi-page OME-TIFF), where few files have many
pages and caching avoids repeated IFD parsing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prevents file descriptor exhaustion with thousands of OME-TIFFs.
Uses OrderedDict with move_to_end for LRU tracking. Evicted handles
are closed outside the lock.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace global lock during I/O with per-file locks: global lock held
  only for dict bookkeeping (microseconds), per-file lock held during
  page read. Reads to different files proceed in parallel.
- Remove page_idx==0 special case that bypassed the cache — page 0 of
  a multi-page OME-TIFF needs caching just as much as other pages.
  LRU eviction (max 128) prevents fd exhaustion for both single-page
  and multi-page workflows.
- Cache entries are now (TiffFile, Lock) tuples.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Page 0 of multi-page file (catches page_idx==0 special-case bug)
- Multiple files with different page indices
- Negative page_idx (doesn't crash)
- Concurrent reads of 6 channels from same file (thread safety)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_push_api.py Outdated
Comment on lines +643 to +661
def test_load_single_plane_negative_page_idx(self):
"""_load_single_plane returns zeros for negative page_idx."""
import tifffile as tf

loader = _PlaneLoader(height=50, width=50, load_from_disk=True)

with tempfile.NamedTemporaryFile(suffix=".tiff", delete=False) as f:
temp_path = f.name

try:
tf.imwrite(temp_path, np.ones((50, 50), dtype=np.uint16))

loader.file_index[(0, 0, 0, "BF")] = (temp_path, -1)
result = loader.load(0, 0, 0, "BF")

# Negative index may work in Python (tif.pages[-1]), but
# we verify it doesn't crash at minimum
assert result.shape == (50, 50)
assert result.dtype == np.uint16
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test’s name/docstring claim negative page_idx should return zeros, but the assertions only check shape/dtype and the implementation will likely read the last page for -1 (Python negative indexing). Either enforce page_idx >= 0 in the loader and assert zeros here, or rename/update the test to reflect the intended behavior.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code] Fixed in 1e0549f - renamed test and now validates that negative page_idx raises ValueError at registration time.

Comment thread ndviewer_light/core.py
Comment on lines +2817 to +2821
with self._tiff_handles_lock:
handles = [tif for tif, _ in self._tiff_handles.values()]
self._tiff_handles.clear()
self._close_tiff_handles(handles)

Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_close_tiff_handle_cache() closes cached TiffFile handles without synchronizing with their per-file locks. If dask worker threads are still reading (or about to read) when acquisition ends / viewer closes, this can close a handle mid-read. Consider coordinating shutdown with in-flight loads (e.g., acquire each cached entry’s file_lock before closing and/or ensure no new loads can start before clearing/closing the cache).

Suggested change
with self._tiff_handles_lock:
handles = [tif for tif, _ in self._tiff_handles.values()]
self._tiff_handles.clear()
self._close_tiff_handles(handles)
# Copy out the cached entries under the cache-level lock, then close each
# handle while holding its own per-file lock to avoid races with in-flight
# reads from worker threads.
with self._tiff_handles_lock:
entries = list(self._tiff_handles.values())
self._tiff_handles.clear()
for tif, file_lock in entries:
# Be defensive in case file_lock is None or missing.
if file_lock is None:
self._close_tiff_handles([tif])
continue
# Ensure no other thread is using this TiffFile while we close it.
try:
# threading.Lock and similar primitives support the context manager
# protocol; this will block until any in-flight readers complete.
with file_lock:
self._close_tiff_handles([tif])
except Exception as e:
# Fall back to best-effort close if acquiring the lock or closing fails.
logger.debug("Failed to safely close cached TiffFile handle: %s", e)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code] Fixed in 1e0549f - _close_tiff_handle_cache now acquires each per-file lock before closing, waiting for in-flight readers to complete.

Comment thread ndviewer_light/core.py
Comment on lines +2046 to +2051
while len(self._tiff_handles) > self._tiff_handles_max:
_, (old_tif, _) = self._tiff_handles.popitem(last=False)
evicted.append(old_tif)
# Close evicted handles outside the global lock
self._close_tiff_handles(evicted)
# Read page under per-file lock
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LRU eviction closes TiffFile handles without coordinating with the per-file lock. Another thread may have already fetched (tif, file_lock) and be about to read (or currently reading) when the handle is closed, leading to races/crashes or reads on closed file descriptors. Consider acquiring the evicted entry’s file_lock before closing, and ensure readers acquire file_lock while still holding _tiff_handles_lock so an entry can’t be evicted/closed after a thread has taken a reference but before it has locked the file.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code] Fixed in 1e0549f - LRU eviction now acquires per-file lock before closing evicted handles.

Comment thread ndviewer_light/core.py Outdated
with self._tiff_handles_lock:
evicted_entry = self._tiff_handles.pop(filepath, None)
if evicted_entry is not None:
self._close_tiff_handles([evicted_entry[0]])
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the IndexError recovery path you pop and close a cached handle without synchronizing with the per-file lock. If another dask worker is concurrently reading the same file (or has already fetched the handle), this can close the handle while it’s in use. To make this safe, coordinate eviction/closing with the stored file_lock (e.g., acquire it before closing) and avoid handing out a handle reference that can be closed before the caller acquires the lock.

Suggested change
self._close_tiff_handles([evicted_entry[0]])
tif, file_lock = evicted_entry
# Coordinate closing with the per-file lock so no concurrent
# readers can be using the handle while it is being closed.
with file_lock:
self._close_tiff_handles([tif])

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code] Fixed in 1e0549f - IndexError eviction path now acquires per-file lock before closing stale handle.

Comment thread ndviewer_light/core.py
Comment on lines 1843 to 1846
# Update file index (protected by lock for dask worker thread safety)
with self._file_index_lock:
self._file_index[(t, fov_idx, z, channel)] = filepath
self._file_index[(t, fov_idx, z, channel)] = (filepath, page_idx)

Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

register_image() accepts page_idx but doesn’t validate it. Negative values will be treated as Python negative indexing (tif.pages[-1]) and could silently show the wrong plane; very large values will always hit the error path. Consider validating page_idx >= 0 at registration time (or clamping/returning zeros with a warning) so callers get deterministic behavior.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code] Fixed in 1e0549f - register_image() now raises ValueError for negative page_idx.

- Eviction (LRU, IndexError, cache clear): acquire per-file lock before
  closing to wait for in-flight readers, preventing use-after-close
- _close_tiff_handle_cache: same pattern for bulk cleanup
- register_image: reject negative page_idx with ValueError
- Add test for negative page_idx validation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_push_api.py Outdated
Comment on lines +660 to +662
with self.assertRaises((ValueError, TypeError)):
LightweightViewer.register_image(dummy, 0, 0, 0, "BF", "/fake.tiff", -1)

Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test uses self.assertRaises(...) but the test classes in this file are plain pytest-style classes (not unittest.TestCase), so assertRaises will raise AttributeError and the test will fail. Use pytest.raises(...) (import pytest) or convert the class to inherit from unittest.TestCase.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code] Fixed in 502af94 - replaced self.assertRaises with pytest.raises since TestLoadSinglePlane is a plain class, not unittest.TestCase.

hongquanli and others added 2 commits March 25, 2026 00:18
TestLoadSinglePlane is not a unittest.TestCase subclass, so
self.assertRaises is not available. Use pytest.raises instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two threads reading the same file: if thread A hits IndexError and
closes the handle, thread B (waiting on the per-file lock) would read
from a closed handle. Fix: just remove from dict so next lookup
re-opens; defer closing to end_acquisition()/closeEvent().

LRU eviction closing is safe: the evicted entry is the oldest, and the
current thread just moved its entry to MRU. Another thread holding a
reference to the LRU entry would require 128+ new files opening in
microseconds — practically impossible.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ndviewer_light/core.py
Comment on lines +2047 to +2061
# Evict LRU handles if over limit
while len(self._tiff_handles) > self._tiff_handles_max:
_, evicted = self._tiff_handles.popitem(last=False)
evicted_entries.append(evicted)
# Close evicted handles. Safe because LRU eviction only removes
# the oldest entry, which the current thread just moved away from
# (move_to_end). For another thread to hold a reference to the
# evicted entry, 128+ new files would need to open in the
# microseconds between its lookup and file_lock acquire.
for old_tif, old_lock in evicted_entries:
with old_lock:
self._close_tiff_handles([old_tif])
# Read page under per-file lock
with file_lock:
plane = tif.pages[page_idx].asarray()
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential race: _load_single_plane retrieves (tif, file_lock) under _tiff_handles_lock, releases the global lock, and only then acquires file_lock before reading. Another thread can evict/close (LRU eviction or _close_tiff_handle_cache) the same tif in that gap by acquiring file_lock first, leaving this thread to later acquire the lock and read from a closed handle. Consider acquiring file_lock while still holding _tiff_handles_lock (then release _tiff_handles_lock and proceed with file_lock held) so the handle cannot be closed/evicted between lookup and read.

Suggested change
# Evict LRU handles if over limit
while len(self._tiff_handles) > self._tiff_handles_max:
_, evicted = self._tiff_handles.popitem(last=False)
evicted_entries.append(evicted)
# Close evicted handles. Safe because LRU eviction only removes
# the oldest entry, which the current thread just moved away from
# (move_to_end). For another thread to hold a reference to the
# evicted entry, 128+ new files would need to open in the
# microseconds between its lookup and file_lock acquire.
for old_tif, old_lock in evicted_entries:
with old_lock:
self._close_tiff_handles([old_tif])
# Read page under per-file lock
with file_lock:
plane = tif.pages[page_idx].asarray()
# Acquire the per-file lock while still holding the global
# handles lock so that this handle cannot be closed/evicted
# between lookup/creation and use.
file_lock.acquire()
# Evict LRU handles if over limit. This will never evict the
# just-used entry because it was moved to the end and we pop
# from the front.
while len(self._tiff_handles) > self._tiff_handles_max:
_, evicted = self._tiff_handles.popitem(last=False)
evicted_entries.append(evicted)
# Close evicted handles outside the global lock, but still under
# each handle's own lock to serialize access.
for old_tif, old_lock in evicted_entries:
with old_lock:
self._close_tiff_handles([old_tif])
# Read page while holding the per-file lock that we acquired
# inside the global lock above. Ensure the lock is always
# released, even on error.
try:
plane = tif.pages[page_idx].asarray()
finally:
file_lock.release()

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code] Skipped - the LRU eviction race requires 128+ new files opening in the microseconds between lookup and file_lock acquire, which is practically impossible. The evicted entry is always the oldest (LRU), and the current thread just moved its entry to MRU. Acquiring file_lock inside the global lock would work but adds nested lock complexity for a race that can't happen in practice.

Comment thread ndviewer_light/core.py
Comment on lines +2068 to +2072
# newly appended pages. Don't close the handle here — another
# thread may still hold a reference and be about to read. The
# handle will be closed at end_acquisition() / closeEvent().
with self._tiff_handles_lock:
self._tiff_handles.pop(filepath, None)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IndexError handling evicts the cached entry via pop(filepath, None) but does not close the corresponding TiffFile. Since the entry is removed from _tiff_handles, _close_tiff_handle_cache() will never see it, so this can leak file descriptors during live acquisition when pages are requested before they exist. Capture the popped (tif, file_lock) and close tif under file_lock (this becomes safe once lookup/read is synchronized as noted above).

Suggested change
# newly appended pages. Don't close the handle here — another
# thread may still hold a reference and be about to read. The
# handle will be closed at end_acquisition() / closeEvent().
with self._tiff_handles_lock:
self._tiff_handles.pop(filepath, None)
# newly appended pages. Close the handle explicitly here to
# avoid leaking file descriptors; this is safe once lookup/read
# is synchronized.
popped = None
with self._tiff_handles_lock:
popped = self._tiff_handles.pop(filepath, None)
if popped is not None:
tif, file_lock = popped
# Ensure no concurrent reads on this handle while closing.
with file_lock:
try:
tif.close()
except Exception:
# Best-effort close; log and continue.
logger.exception("Error while closing TIFF handle for %s", filepath)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code] Intentional - closing here causes a use-after-close race: two threads reading the same file, thread A hits IndexError and closes, thread B (waiting on file_lock) then reads from a closed handle. We chose to orphan the fd instead — bounded by (retry count × slow-write files), cleaned up at end_acquisition()/closeEvent(). See commit d557161.

hongquanli and others added 2 commits March 25, 2026 00:51
- test_cache_stays_bounded_under_stress: 20 files with cache max 8,
  verifies cache never exceeds limit after loading all planes
- test_concurrent_register_and_load: 4 threads concurrently registering
  and loading from 20 multi-page TIFFs, verifies no errors or state
  corruption

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ndviewer_light/core.py
Comment on lines +2087 to +2091
# newly appended pages. Don't close the handle here — another
# thread may still hold a reference and be about to read. The
# handle will be closed at end_acquisition() / closeEvent().
with self._tiff_handles_lock:
self._tiff_handles.pop(filepath, None)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the IndexError handler you pop() the cached (tif, file_lock) entry from _tiff_handles but don’t close it (and since it’s removed from the cache, _close_tiff_handle_cache() will never see it). This will leak file descriptors over time when pages are requested before they exist. If closing immediately is unsafe, keep the popped handle+lock in a separate collection to be closed later in _close_tiff_handle_cache() (acquiring its file_lock first), or adjust the eviction strategy so the handle remains reachable for later cleanup.

Suggested change
# newly appended pages. Don't close the handle here — another
# thread may still hold a reference and be about to read. The
# handle will be closed at end_acquisition() / closeEvent().
with self._tiff_handles_lock:
self._tiff_handles.pop(filepath, None)
# newly appended pages. Also close the handle here to avoid
# leaking file descriptors. We first remove it from the cache
# under the cache lock, then acquire the per-file lock before
# closing to remain thread-safe with any concurrent readers.
handle_entry = None
with self._tiff_handles_lock:
handle_entry = self._tiff_handles.pop(filepath, None)
if handle_entry is not None:
tif, file_lock = handle_entry
try:
with file_lock:
try:
tif.close()
except Exception:
# Failure to close shouldn't prevent continued use;
# log at debug level to avoid spamming the user.
logger.debug(
"Failed to close TIFF handle for %s after IndexError",
filepath,
exc_info=True,
)
except Exception:
# If acquiring the file_lock fails for any reason, we still
# prefer not to crash this load path.
logger.debug(
"Failed to acquire file lock for %s while closing TIFF handle",
filepath,
exc_info=True,
)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code] Intentional — closing here causes a use-after-close race when two threads read the same file and one hits IndexError (see commit d557161 for analysis). The orphaned fd is bounded by retry count and cleaned up at end_acquisition()/closeEvent(). The suggestion to keep popped handles in a separate collection is interesting but adds complexity for a narrow leak window.

Comment thread tests/test_push_api.py Outdated

sig = _inspect.signature(LightweightViewer.register_image)
if "page_idx" not in sig.parameters:
return # Skip if running against older installed version
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test silently returns if the imported LightweightViewer.register_image doesn’t have a page_idx parameter. In CI that can mask an environment/config issue where tests are accidentally running against an older installed package instead of the checked-out code. Consider failing loudly (e.g., an assertion explaining the wrong import) rather than skipping, so regressions or packaging issues don’t go unnoticed.

Suggested change
return # Skip if running against older installed version
raise AssertionError(
"LightweightViewer.register_image is missing the 'page_idx' "
"parameter. Tests are likely running against an outdated or "
"incorrect ndviewer_light installation instead of the "
"checked-out code."
)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code] Fixed in 3f28caa - replaced silent return with assert that fails loudly if page_idx parameter is missing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@hongquanli hongquanli merged commit e98cee9 into main Mar 25, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants