This file is read automatically by Claude Code at the start of every session. It provides full context on the architecture, design decisions, and current state of the project so any Claude instance can contribute immediately.
A web-based spatial transcriptomics viewer supporting multiple platforms (Xenium, MERSCOPE, CosMx) with connectivity layers produced by the lab's NICHESv2 R pipeline. Built because Xenium Explorer does not support cell-cell ligand-receptor mechanism (LRM) visualization, and extended to be platform-agnostic.
Core insight: Rather than rasterizing 488 LRM outputs as PNG images, we store
connectivity data as a single edges.parquet file and render it as WebGL vector lines.
This allows instant toggling of 488 LRMs, coloring by any metadata, and zoom-independent
rendering. The edges.parquet format is platform-agnostic — it works with any spatial
dataset as long as cell barcodes match.
Browser
OpenSeadragon — pan/zoom over OME-TIFF tile pyramid (morphology image)
deck.gl (WebGL) — all data layers rendered as vectors, coordinate-synced to OSD
React + Zustand — UI state management
FastAPI backend
/tiles — OME-TIFF → DZI tile pyramid (pyvips/tifffile), tile serving
/spatial — platform-agnostic: transcripts, cell boundaries, cell metadata,
gene expression, color-values, dataset list, per-dataset image list
/edges — edge list, LRM catalogue, per-edge color values, edge detail
/layers — generic parquet layer serving (extensible)
Key architectural constraint: OpenSeadragon handles all pan/zoom events. deck.gl
sits in an absolutely-positioned canvas on top, with its viewport synced to OSD via a
custom syncDeckFromOSD function on every OSD viewport-change event. All data is
returned in image pixel coordinates (native_coord / pixel_size).
The backend uses an abstract reader pattern. All platform readers inherit from
SpatialDatasetReader (base_reader.py) and implement the same interface.
ReaderFactory auto-detects the platform from directory contents.
Detection order:
| Platform | Sentinel file |
|---|---|
| Xenium (10x Genomics) | experiment.xenium |
| MERSCOPE (Vizgen) | cell_by_gene.csv or cell_metadata.csv |
| CosMx (Nanostring) | *_tx_file.csv |
Coordinate contract: Every reader converts native coordinates to image pixel space before returning data. The frontend always receives pixel coordinates.
Implementation status:
- Xenium: fully implemented
- MERSCOPE: cells, transcripts, genes, color-values (metadata + gene-set) implemented; cell boundaries stub (MERSCOPE uses HDF5 boundary format, not yet parsed)
- CosMx: cells, transcripts, genes, metadata color-values implemented; gene-set color-values stub (requires transcript aggregation per cell)
backend/
app/
main.py FastAPI entry point, CORS, router registration
routers/
tiles.py DZI descriptor + tile serving; auto-builds pyramid on first request
spatial.py Platform-agnostic router: all /spatial/... endpoints
xenium.py DEPRECATED — kept for reference; not registered in main.py
edges.py edge query, LRM catalogue, edge color values, edge detail
layers.py generic parquet layer router
readers/
base_reader.py Abstract base class — SpatialDatasetReader interface
reader_factory.py ReaderFactory: auto-detect platform, instantiate reader
xenium_reader.py Xenium implementation (inherits SpatialDatasetReader)
merscope_reader.py MERSCOPE implementation (inherits SpatialDatasetReader)
cosmx_reader.py CosMx implementation (inherits SpatialDatasetReader)
edge_reader.py reads edges.parquet; query_grouped(), lrm_catalogue(), edge_color_values(), edge_detail()
layer_reader.py generic parquet reader
tiling/
pyramid.py OME-TIFF → DZI; pyvips streaming primary, tifffile+Pillow fallback
requirements.txt pinned deps; cffi<2.0 required for pyvips 2.2.3 compatibility
Dockerfile
frontend/
src/
store.js Zustand store — ALL shared state lives here
components/
App.jsx Root component; wraps everything in a React ErrorBoundary
Viewer.jsx Split-screen wrapper (Viewer) + per-panel logic (ViewerPanel)
LayerPanel.jsx Right-side panel: toggles, opacity, color-by, legends,
dataset/image picker, transcript species filter
CellInfoPanel.jsx Floating panel on cell click; shows color-by value highlight
EdgeInfoPanel.jsx Floating panel on edge/autocrine click
AnnotationToolbar.jsx Region drawing + measurement tools; ⊞ Split / □ Single toggle; ⇔ Match zoom
hooks/
useTranscripts.js Viewport-bounded transcript fetch (bbox always sent; skip at low zoom)
useCellBoundaries.js Viewport-bounded cell boundary fetch (skip when fracW >= 0.5)
useCellColors.js POST color-values; maps cell_id → RGBA; supports clamp
useEdgeColors.js lrm_set: client-side from visible_score_sum; metadata: POST edge-color-values
useEdges.js Viewport-bounded edge fetch; POSTs to /query-grouped
utils/
colormap.js Palette definitions (viridis/plasma/magma/inferno) + valueToColor()
geneColor.js Deterministic gene → color mapping
vite.config.js Dev server proxies /api → localhost:8000
nginx.conf Production: proxies /api/ → backend:8000/
Dockerfile Multi-stage: node build → nginx serve
docker-compose.yml Repo root; mounts DATA_PATH (or sample_data/) as /data:ro
docker/docker-compose.yml Legacy path (kept for compatibility)
sample_data/ GITIGNORED — default data mount for local dev/demo
r/
export_NICHESObject_for_viewer.R draft R function for NICHESv2 → edges.parquet export
docs/
data_format.md edges.parquet column spec for NICHESv2 R export
setup.md Docker deployment guide
public_datasets.md Links to public Xenium datasets used for development
One row per (directed edge) × (LRM). This is the long/sparse format from NICHESv2. Platform-agnostic — works with any spatial dataset as long as cell barcodes match.
| Column | Type | Notes |
|---|---|---|
edge |
string | "SendingCell|ReceivingCell" — directed edge ID |
sending_cell |
string | Cell barcode matching the platform's cell_id |
receiving_cell |
string | Cell barcode |
is_autocrine |
bool | True when sending == receiving |
lrm |
string | "ligand|receptor" mechanism ID |
lrm_id |
int | Integer index (1–N) |
ligand |
string | |
receptor |
string | |
score |
float | Raw NICHESv2 score |
score_norm |
float | Score normalized within edge (sums to 1) |
x1, y1 |
float | Sending cell centroid, native µm coords |
x2, y2 |
float | Receiving cell centroid |
sending_type |
string | Optional cell type label |
receiving_type |
string | Optional cell type label |
Important: Coordinates in edges.parquet are in native µm. The backend divides by
pixel_size (from the reader) when serving to the frontend.
The sample_data/make_edges.py script generates synthetic demo data in this format.
Real data comes from export_for_TissuePlex() in the NICHESv2 R package.
User-defined metadata (e.g. from external R analysis) can be loaded without modifying
the dataset output by placing files in a cell-metadata/ subdirectory of the dataset.
Currently implemented in XeniumReader; the pattern should be ported to other readers.
dataset_dir/
experiment.xenium (or equivalent platform sentinel)
cells.parquet
cell-metadata/ ← create this directory
my_metadata.csv ← one or more files here
clusters.csv
pseudotime.parquet
Supported formats: .csv, .csv.gz, .parquet. Multiple files are allowed and
are outer-joined on the barcode key.
Barcode column resolution (in order of precedence):
- A column explicitly named
cell_id Unnamed: 0— pandas' name for R's unnamed rowname column fromwrite.csv(row.names=TRUE)- The first column if it contains unique strings (generic fallback)
- Parquet files:
cell_idcolumn required
Standard R export that works out of the box:
write.csv(my_metadata_df, file.path(dataset_dir, "cell-metadata", "metadata.csv"))
# row.names=TRUE is R's default; barcodes go in the first unnamed columnHow it surfaces in the UI: supplemental columns are merged into the cells table via
XeniumReader._cells_full(). They appear automatically in the "Cell metadata" color-by
dropdown. Continuous columns get a gradient colormap; string or low-cardinality integer
columns get discrete colors. The cell-click info panel also shows the supplemental fields.
XeniumReader._cells_full() is cached per reader instance (one Docker request lifecycle).
_load_supplemental_metadata() is also cached, so the CSV is only parsed once regardless
of how many color-by requests arrive.
All shared state lives in a single Zustand store. Key sections:
- Dataset / image:
dataset(null on init, auto-set from/spatial/datasets),activeImage(which OME-TIFF to show; auto-set from/spatial/{dataset}/images) - Layer visibility:
layersobject — each layer hasvisible+opacity;cellSegmentsalso hasoutlineOpacity(independent from fill opacity) - Cell color:
cellColorEnabled,colorBy(mode: off/gene_set/metadata,field),cellColorPalette,cellColorClamp(squish/oob cutoffs) - Transcript gene filter:
selectedGenes—null= no filter (show all);Set<string>= allowlist (show only those genes). Dataset-scoped; resets on dataset change. See Gene Filter section below. - Edge density:
edgeDensity— fraction of available viewport edges to render (0.01–1.0, default 0.1). Applies to both the tissue graph layer and the directed edges layer. Slider is top-level in LayerPanel, between the two sections. - Edge style:
edgeWidth,showArrowheads,arrowStyle(full/half-harpoon),arrowheadScale,edgeDirectional,edgeOffset(perpendicular separation),showAutocrine - Edge color:
edgeColorBy(mode: default/lrm_set/metadata),edgeColorPalette,edgeColorClamp - LRM filter:
hiddenLrms(Set of "ligand|receptor" strings),lrmCatalogue - Selection:
selectedCell,selectedEdge - Annotations:
regions,measurements,activeRegion,annotationMode - Split-screen:
panelCount(1 or 2),viewports(array of two viewport objects, one per panel —{xmin,ymin,xmax,ymax}in image pixels),pendingZoomMatch(nullor{ fromPanel }— consumed by the target panel to match zoom while keeping its own center).requestZoomMatch(fromPanel)/clearZoomMatch()are the corresponding actions.
dataset starts as null. LayerPanel.jsx::DatasetPicker fetches
/spatial/datasets on mount and calls setDataset(list[0]) if the current dataset
is null or no longer in the list. Similarly, activeImage is auto-set from
/spatial/{dataset}/images (OME-TIFFs in the dataset folder, morphology-first).
Viewer.jsx renders a "Loading datasets…" placeholder while dataset === null so
no hooks fire against a null dataset. All data hooks guard against non-ok HTTP
responses — each returns an empty array on 404/500 so a missing file never causes
a render crash.
The gene filter uses an allowlist model, not a denylist:
selectedGenes = null— no filter; all transcripts are shownselectedGenes = Set{...}— only transcripts whosefeature_nameis in the set are rendered
The selection is built from allGenes (fetched once per dataset from
/spatial/{dataset}/genes), so it is stable across pan/zoom. The UI in
LayerPanel.jsx::TranscriptSpeciesSection:
- Collapsed / no filter: shows
all N geneswith aselect ▼button - Collapsed / filter active: shows
M / N genes selected, a compact list of selected genes (each with a ✕ remove button), aclearbutton, and anedit ▼button - Expanded picker: full gene list (searchable) with checkboxes,
all(→ null) andnone(→ empty Set) buttons
toggleSelectedGene(gene): if selectedGenes is null, starts a new Set with just
that gene. If it's a Set, toggles membership. Opening the picker while null shows all
genes as checked; unchecking one starts an allowlist.
useCellColors gene_set mode: if selectedGenes === null, uses all allGenes;
otherwise uses [...selectedGenes].
Layers rendered in order (bottom to top):
cell-segments-fill— SolidPolygonLayer, cell fill colorscell-segments-outline— PathLayer, cell boundariestranscripts— ScatterplotLayer, transcript dotstissue-graph— LineLayer, ALL unique undirected cell pairs (structural background, LRM-agnostic)edges-directed— LineLayer, directed edges (LRM-filtered, colored)edges-arrowheads— SolidPolygonLayer, filled arrowhead triangles (full or harpoon style)edges-autocrine— ScatterplotLayer (stroked only), autocrine rings- Annotation layers (region fills, outlines, measurement lines)
Tissue graph vs Edge data: Tissue graph = binary structural layer (which cells are connected at all, regardless of LRM). Edge data = quantitative/categorical overlay on top. Analogous to cell segment outlines (structure) vs cell fill color (expression).
Two LRM count fields in query_grouped response:
lrm_count— total LRM rows for this edge (used by tissue graph — show all structural pairs regardless of LRM filter)visible_lrm_count— LRMs not inhiddenLrms(used by directed edges — hide edge when 0)visible_score_sum— SUM(score) for non-excluded LRMs (used for client-side lrm_set color mapping)
Directional rendering: A→B and B→A are offset perpendicular to the edge axis so they
appear as two distinct parallel lines. Offset amount is tunable (edgeOffset, default 4px).
Both are offset to their own LEFT, so harpoon arrowheads on the outer side naturally form
the chemistry ⇌ notation.
Picking: OSD consumes pointer events. After each click, deck.pickObject() is called
manually at the click coordinates. Normal click: cell fill checked first, then edge layers.
Shift+click: edge layers checked first (useful when edges and cells overlap). The
tissue-graph layer is also pickable (selecting it opens the EdgeInfoPanel).
Results set selectedCell or selectedEdge in the store.
Viewer.jsx exports two components:
ViewerPanel({ panelIndex })— contains all viewer logic: its own OSD instance, deck.gl canvas, data hook calls, click handlers, annotation overlay, and toolbar. Readsviewports[panelIndex]from the store for its own viewport-bounded fetches.Viewer(default export) — thin wrapper; renders<ViewerPanel panelIndex={0} />always, plus<ViewerPanel panelIndex={1} />whenpanelCount >= 2.
What is per-panel (local state / per-instance):
- OSD viewer instance (
viewerRef) - deck.gl ref (
deckRef) - deck.gl view state (
deckViewState) - Per-panel viewport in store (
viewports[panelIndex]) osdOpenCount— local counter incremented on each OSDopenevent; used as dep for the morphology opacity effect to ensure it fires regardless of whetherimageSize.wchanged (fixes the bug where morphology stayed visible after dataset switches with same-dimension images, and in panel 2 on first open)
What is shared (global store):
- All layer toggles, opacities, color-by settings, LRM filter, edge density, etc.
selectedCell,selectedEdge(global — EdgeInfoPanel only renders in panel 0)imageSize(both panels open the same DZI; panel 0 sets it, panel 1 may also set the same values redundantly — harmless)
Guarded to panel 0 only (to avoid double-writes):
- Platform info fetch (
/spatial/{dataset}/info) setCellColorRange,setEdgeColorRange,setEdgeColorClampupdates- EdgeInfoPanel rendering
⇔ Match zoom flow:
requestZoomMatch(fromPanel) → both panels' effects fire → source panel early-returns
(fromPanel === panelIndex) → target panel reads viewports[fromPanel] via
useStore.getState(), computes OSD-normalised width/height, gets its own current center
via viewport.getCenter(true), constructs new bounds at same size centered on its own
center, calls viewport.fitBounds(newBounds, false) (animated), then clearZoomMatch().
All data hooks (transcripts, cell boundaries, edges) are debounced and skip fetches that would be wasted at the current zoom level:
| Hook | Skip condition | Bbox filter | Limit |
|---|---|---|---|
useTranscripts |
fracW >= 0.7 |
Always sent when viewport available | 50K (random sample) |
useCellBoundaries |
fracW >= 0.5 |
Always sent | 20K cells |
useEdges |
no viewport | Always sent | 10K–50K edges (grouped) |
fracW = (xmax - xmin) / imageSize.w — fraction of image width visible.
Transcript sampling: the backend uses df.sample(n=limit) (random, not head)
so the 50K returned transcripts are spatially uniform across the viewport rather than
biased toward whatever region appears first in the parquet row order.
Edge aggregation: useEdges POSTs to /edges/{dataset}/query-grouped which returns
one row per directed edge (GROUP BY edge, ORDER BY RANDOM()). For a 168M-row parquet
(~300K edges × 559 LRMs) this is ~500× fewer rows than the raw query. The excluded_lrms
list is sent in the request body so visible_lrm_count and visible_score_sum are
pre-computed server-side.
valueToColor(value, vmin, vmax, palette) in colormap.js maps a scalar to RGBA.
interpolateStops clamps t to [0,1], so passing a tighter [lo, hi] window achieves
oob::squish behavior — values outside the window get the palette endpoints.
The cellColorClamp / edgeColorClamp store values are passed into the color hooks
and applied as lo = clamp.low ?? dataMin, hi = clamp.high ?? dataMax.
Edge lrm_set coloring is fully client-side: useEdgeColors computes colors
synchronously from visible_score_sum in the already-fetched edges array. No server
call is made for lrm_set mode. The p95 of visible_score_sum across the current
viewport is auto-set as edgeColorClamp.high so the color range adapts to the data
rather than being dominated by outlier edges.
Categorical data uses QUAL_PALETTE (20 visually distinct colors) from colormap.js.
Beyond 20 categories, geneColor() provides deterministic hash-based colors.
The backend uses pyvips when available (fast streaming, handles very large OME-TIFFs without loading the full image into RAM) with a tifffile+Pillow fallback for environments without libvips. Key details:
- Pyramids are built on first DZI request (auto-triggered by
tiles.py::dzi_descriptor) and cached inCACHE_DIR(Docker volumedzi_cache, or alongside data in dev). The build is idempotent. - pyvips path: detects availability with
pyvips.version(0)(catchesOSErrorwhen the C library is missing —except ImportErroris not sufficient). Builds a lazy MIP pipeline across all Z-planes usingifthenelsechains; no full image in RAM. Callsimg.dzsave(...)to stream tiles. - tifffile fallback: reads one OME level at a time, skips levels too large for
available RAM (guard:
MAX_TIFFFILE_DIM = 16384), computes normalisation stats from the smallest available level. - OME-TIFFs from Xenium use JPEG2000 compression — requires
imagecodecspip package.
Local dev (no Docker):
# Backend
cd backend
pip install -r requirements.txt
DATA_ROOT=../sample_data uvicorn app.main:app --reload
# Frontend (separate terminal)
cd frontend
npm install
npm run dev # → http://localhost:5173, proxies /api → :8000Note: dev server runs on port 5173 (not 3000) to avoid conflicting with Docker,
which binds port 3000. This is configured in .claude/launch.json.
Docker — demo data (sample_data/):
docker compose up --build # first time or after code changes
docker compose up # subsequent runs
docker compose downDocker — external data directory:
DATA_PATH="/absolute/path/to/datasets" docker compose up --buildDATA_PATH must be an absolute host path with no colons. Drop any supported platform
output folder under DATA_PATH — TissuePlex auto-detects the platform on first access.
pyvips==2.2.3is incompatible withcffi>=2.0— pinned ascffi<2.0in requirements.txtimagecodecsis required for JPEG2000 OME-TIFFs (Xenium standard format)pyvipsraisesOSError(notImportError) when the libvips C library is missing; catchExceptionbroadly or test withpyvips.version(0)- The
/{edge_id:path}FastAPI route converter is required to handle|in edge IDs is_autocrinefrom pandas parquet isnumpy.bool_— must cast tobool()before JSON serialization- OSD and deck.gl use different coordinate systems; the
syncDeckFromOSDfunction in Viewer.jsx is the critical bridge — do not break it - Docker volume specs use
:as separator; host paths containing:(e.g. network mount paths on macOS) will causeinvalid volume specificationerrors - All data hooks must guard against non-ok HTTP responses (return
[]on error); storing a{"detail": "..."}error object as the edges/transcripts/cells array causes deck.gl to throw "not iterable" errors in minified code query_groupedresponse rows must be sanitized (NaN/inf → None) before JSON serialization —visible_score_sumcan be NaN when score column contains NaN values- Reader instances are cached at router level (
_reader_cachedicts inedges.pyandspatial.py) so instance-level caches (_cells_full_cache,_schema_cache,_lrm_catalogue_cache) survive across requests. The LRM catalogue scan (1s on 168M rows) is cached perEdgeReaderinstance in_lrm_catalogue_cache. - DuckDB binds
?parameters in SQL text order, not logical clause order. Inquery_grouped, the SELECT CASE WHEN clauses appear before the WHERE clause, soexcl_paramsmust come beforewhere_paramsinall_params. - Real parquet files can have completely null LRM rows (lrm=null, ligand=null, receptor=null).
The catalogue query filters these with
WHERE lrm IS NOT NULL; the endpoint strips null entries fromexcluded_lrmswith[x for x in lst if x is not None]; the Pydantic model usesList[Optional[str]]to accept them without 422 errors. - Morphology layer always-visible bug (fixed): The morphology opacity effect used
imageSize.was a dep to detect OSD open, but this fails when the new image has the same dimensions as the previous one (dep doesn't change → effect doesn't re-run → layer stays at OSD default full opacity). Fixed withosdOpenCount— a localuseStatecounter incremented on every OSDopenevent, used as the effect dep instead. Also fixes panel 2 in split mode, whereimageSize.wwas already set by panel 0 before panel 2's OSD opened. Math.min/maxspread on large arrays (fixed inuseEdgeColors.js): spreading 100K+ element arrays causesRangeError: Maximum call stack size exceeded. Use aforloop to find min/max instead ofMath.min(...arr).
-
R export function —
export_for_TissuePlex()is implemented in the NICHESv2 R package (separate repo). The draft inr/export_NICHESObject_for_viewer.Ris superseded. Seedocs/data_format.mdfor the column spec. -
Cell expression bar chart — click panel currently shows cell metadata but not a sorted gene expression readout. The
/spatial/{dataset}/expression/{cell_id}endpoint exists but the UI component is not built. -
MERSCOPE cell boundaries — MERSCOPE stores boundaries as HDF5 polygon data;
MerscopeReader.cell_boundaries()is a stub returning[]. -
CosMx gene-set coloring — requires per-cell expression aggregation from the transcript file;
CosMxReader._color_values_gene_set()is a stub returning empty. -
Performance at scale — edge rendering is now fast (query-grouped returns ~300K edges as 300K rows instead of 168M rows; colors computed client-side). Remaining bottlenecks: LOD for arrowheads at low zoom, transcript rendering at very high density.
-
Authentication — no auth. Fine for local/lab use, needs work for any public deployment.