`@register_section`: pluggable AnnData sections by katosh · Pull Request #7 · settylab/anndata

katosh · 2026-03-30T19:15:57Z

`@register_section`: pluggable AnnData sections

This PR lets external packages add new sections to AnnData — with storage, validation, subsetting, IO, and repr — using a single decorator. No subclassing needed.

Quick example

from anndata.extensions import register_section

@register_section("obst", alignment="obs")
class ObstSection:
    """Observation trees (like obsm, but for tree data)."""
    pass

That's it. Now every AnnData object has an obst section:

>>> adata = ad.AnnData(
...     X=np.random.rand(4, 3),
...     obs=pd.DataFrame({"cell_type": ["T", "T", "B", "B"]}, index=["c1", "c2", "c3", "c4"]),
...     var=pd.DataFrame(index=["CD8A", "LCK", "MS4A1"]),
...     obst={"lineage": np.random.rand(4, 2)},    # init kwarg works
... )

>>> adata.obst["lineage"].shape
(4, 2)

>>> repr(adata)
AnnData object with n_obs × n_vars = 4 × 3
    obs: 'cell_type'
    obst: 'lineage'

>>> t_cells = adata[adata.obs["cell_type"] == "T"]   # subsetting works
>>> t_cells.obst["lineage"].shape
(2, 2)

>>> adata.write("test.h5ad")                          # IO works
>>> adata2 = ad.read_h5ad("test.h5ad")
>>> adata2.obst["lineage"].shape
(4, 2)

Why

Packages that extend AnnData (TreeData, SpatialData) currently have to subclass AnnData or reimplement its internals. TreeData reimplements the entire write traversal with its own hardcoded section list. The same list is hardcoded in at least four places across anndata (write_h5ad, write_anndata, read_anndata, _gen_repr). @register_section makes all four discoverable.

Alignment

The alignment parameter declares which AnnData axes each dimension of the stored data is aligned to. This controls both validation (shape must match) and subsetting (which dims get sliced when you do adata[obs_idx, var_idx]).

alignment	subsetting behavior	like	use case
`"obs"`	dim 0 follows obs	obsm	per-cell embeddings
`"var"`	dim 0 follows var	varm	per-gene annotations
`("obs", "var")`	dim 0 = obs, dim 1 = var	layers	alternative matrices
`("obs", "obs")`	both dims follow obs	obsp	cell-cell distances
`("var", "var")`	both dims follow var	varp	gene-gene correlations
`()`	no subsetting	—	images, configs
`("obs", "obs", "var")`	3D tensor	—	cell-cell communication per gene
`("obs", "var", "var")`	3D tensor	—	cell-specific gene regulation

3D tensors for cell-cell communication (CellChat, LIANA, CellPhoneDB):

@register_section("cellcomm", alignment=("obs", "obs", "var"))
class CellCommSection:
    """Ligand-receptor scores: sender_cell × receiver_cell × gene."""
    pass

>>> adata.cellcomm["lr_scores"] = np.random.rand(4, 4, 3)  # (n_obs, n_obs, n_vars)
>>> adata.cellcomm["lr_scores"].shape
(4, 4, 3)

>>> t_cells = adata[adata.obs["cell_type"] == "T"]
>>> t_cells.cellcomm["lr_scores"].shape      # both cell dims subset
(2, 2, 3)

>>> sub = adata[:, ["CD8A", "LCK"]]
>>> sub.cellcomm["lr_scores"].shape           # gene dim subsets
(4, 4, 2)

Cell-specific gene regulatory networks (SCENIC, CellOracle, Dictys):

@register_section("genereg", alignment=("obs", "var", "var"))
class GeneRegSection:
    """Per-cell GRN: cell × source_gene × target_gene."""
    pass

>>> adata.genereg["scenic"] = np.random.rand(4, 3, 3)  # (n_obs, n_vars, n_vars)

>>> t_cells = adata[adata.obs["cell_type"] == "T"]
>>> t_cells.genereg["scenic"].shape          # cell dim subsets
(2, 3, 3)

>>> sub = adata[:, ["CD8A", "LCK"]]
>>> sub.genereg["scenic"].shape               # both gene dims subset
(4, 2, 2)

Custom behavior

All methods are optional. Omit any you don't need.

@register_section("obst", alignment="obs")
class ObstSection:
    value_type = nx.DiGraph                      # type enforcement
    section_after = "obsm"                       # position in repr
    section_tooltip = "Observation trees"        # hover text

    @staticmethod
    def validate(key, value):                    # custom validation
        if not nx.is_tree(value):
            raise ValueError(f"{key} must be a tree")

    @staticmethod
    def subset(value, idx):                      # custom subsetting
        return subset_tree(value, idx)

    @staticmethod
    def serialize(value):                        # custom write
        return digraph_to_json(value)

    @staticmethod
    def deserialize(data):                       # custom read
        return json_to_digraph(data)

    @staticmethod
    def repr_entry(key, value, context):         # custom HTML repr
        return FormattedOutput(type_name=f"Tree ({value.number_of_nodes()} nodes)")

Validation in action:

>>> adata.obst["bad"] = [[1, 2], [3, 4]]
TypeError: Values in 'obst' must be ndarray, got list

>>> adata.obst["bad"] = np.ones(3)              # custom validate
ValueError: bad must be 2D, got 1D

>>> adata.obst["bad"] = np.ones((10, 2))        # alignment check
ValueError: Value for obst['bad'] has shape[0]=10, expected 4 (n_obs)

xarray DataArray example

Custom types that anndata can't natively serialize work end-to-end via serialize/deserialize:

import xarray as xr

@register_section("xr_layers", alignment=("obs", "var"))
class XarrayLayers:
    value_type = xr.DataArray

    @staticmethod
    def serialize(value):
        return value.values          # xarray → numpy for h5ad

    @staticmethod
    def deserialize(data):
        return xr.DataArray(data)    # numpy → xarray on read

>>> adata.xr_layers["scaled"] = xr.DataArray(np.random.rand(4, 3), dims=["obs", "var"])
>>> adata.write("test.h5ad")
>>> adata2 = ad.read_h5ad("test.h5ad")
>>> isinstance(adata2.xr_layers["scaled"], xr.DataArray)
True

What you get for free

Feature	Works automatically
`adata.obst["x"] = array`	Property accessor + validation
`adata[:10].obst`	Subsetting via declared alignment
`adata.copy()`	Deep copy of registered sections
`adata.write("f.h5ad")`	IO via serialize (or standard write_elem)
`ad.read_h5ad("f.h5ad")`	IO via deserialize (or standard read_elem)
`AnnData(obst={...})`	Init kwargs
`repr(adata)`	Shows when non-empty
View copy-on-write	Writing to a view triggers copy

Scaling 3D tensors: factored storage + accessor

A dense (n_obs × n_obs × n_vars) tensor is infeasible for large datasets (1M cells × 1M cells × 30K genes ≈ 10^16 entries). The practical pattern is to store compact rank-R factors and reconstruct on demand:

# Register factor storage (tiny: n_obs × rank and n_vars × rank)
@register_section("comm_obs", alignment="obs")
class CommObs:
    pass

@register_section("comm_var", alignment="var")
class CommVar:
    pass

# Register accessor for tensor reconstruction
@register_anndata_namespace("comm")
class CellCommAccessor:
    def __init__(self, adata: ad.AnnData):
        self._adata = adata

    def tensor(self, key="default"):
        """Reconstruct (obs × obs × var) tensor from factors."""
        U = self._adata.comm_obs[key]   # (n_obs, rank)
        V = self._adata.comm_var[key]   # (n_vars, rank)
        return np.einsum("ir,jr,kr->ijk", U, U, V)

    def query(self, sender, receiver, gene, key="default"):
        """O(rank) point query without materializing tensor."""
        U = self._adata.comm_obs[key]
        V = self._adata.comm_var[key]
        i = self._adata.obs_names.get_loc(sender)
        j = self._adata.obs_names.get_loc(receiver)
        k = self._adata.var_names.get_loc(gene)
        return float(U[i] @ (U[j] * V[k]))

>>> adata.comm_obs["lr"] = np.random.rand(100, 10)   # factors: 12 KB
>>> adata.comm_var["lr"] = np.random.rand(50, 10)

>>> adata.comm.tensor("lr").shape                      # dense tensor: 4 MB
(100, 100, 50)

>>> adata.comm.query("cell_0", "cell_1", "CD8A", "lr") # O(rank), no tensor
0.7386

>>> t_cells = adata[adata.obs["cell_type"] == "T"]
>>> t_cells.comm.tensor("lr").shape                     # factors were subsetted
(50, 50, 50)

>>> adata.write("test.h5ad")                            # only factors written
>>> adata2 = ad.read_h5ad("test.h5ad")
>>> adata2.comm.tensor("lr").shape                      # reconstructs from factors
(100, 100, 50)

This combines @register_section (factor storage with automatic subsetting and IO) with @register_anndata_namespace (tensor API and point queries). For 1M cells with rank 20, the factors are ~160 MB while the dense tensor would be ~240 TB — a 1,500,000× compression.

For moderately-sized datasets, sparse.COO from the PyData sparse package also works directly in registered sections (subsetting handles N-D sparse arrays).

`iter_sections`: centralized section iteration

All built-in sections are registered in _registered_sections alongside extension sections. The iter_sections() utility provides filtered iteration, replacing the hardcoded section lists that were previously duplicated across write_h5ad, write_anndata, read_anndata, _gen_repr, and _mutated_copy.

from anndata._core.section_registry import iter_sections

# All sections with metadata
for spec, value in iter_sections(adata):
    print(f"{spec.name}: kind={spec.kind}, alignment={spec.alignment}")

X: kind=X, alignment=('obs', 'var')
obs: kind=dataframe, alignment=('obs',)
var: kind=dataframe, alignment=('var',)
uns: kind=unstructured, alignment=()
obsm: kind=mapping, alignment=('obs',)
varm: kind=mapping, alignment=('var',)
layers: kind=mapping, alignment=('obs', 'var')
obsp: kind=mapping, alignment=('obs', 'obs')
varp: kind=mapping, alignment=('var', 'var')
raw: kind=raw, alignment=()
obst: kind=mapping, alignment=('obs',)          # ← registered extension

Filter by kind:

# Only dict-like sections (built-in + registered)
for spec, mapping in iter_sections(adata, kinds={"mapping"}):
    print(f"{spec.name}: {list(mapping.keys())}")

# Everything except X, raw, and uns
for spec, value in iter_sections(adata, exclude_kinds={"X", "raw", "unstructured"}):
    ...

# Non-empty sections (for repr)
for spec, value in iter_sections(adata, only_nonempty=True):
    ...

This is how anndata's own IO now works internally:

# write_h5ad (simplified)
for spec, value in iter_sections(adata, exclude_kinds={"X", "raw"}):
    if spec.kind == "dataframe":
        write_elem(f, spec.io_key, value, ...)        # DataFrame directly
    else:
        write_elem(f, spec.io_key, dict(value), ...)   # mapping → dict

Section kinds: "X", "dataframe" (obs/var), "mapping" (obsm/layers/etc. + extensions), "unstructured" (uns), "raw".

Also in this PR

@register_anndata_namespace — custom accessor APIs (adata.spatial.images)
@register_formatter — custom HTML type/section formatters
anndata.extensions module consolidating all extension APIs

Test coverage

73 tests covering all alignment patterns, custom validation, custom IO (JSON, xarray), 3D tensor subsetting, factored tensor with accessor, copy-on-write, and end-to-end workflows for TreeData-like, SpatialData-like, CellChat-like, SCENIC-like, and factored communication scenarios.

Future direction

The alignment tuple naturally extends to custom axes beyond obs/var. A future register_axis could let packages define new named dimensions with their own indices, enabling N-dimensional indexing like adata[obs_idx, var_idx, spatial_idx]. This is the conceptual step from DataFrame (2D) to xarray Dataset (N-D) — with @register_section as the foundation.

# Conflicts: # tests/test_repr_html.py # tests/visual_inspect_repr_html.py

Add register_aligned_section() to anndata.extensions that allows external packages to register new axis-aligned sections (like obsm, layers) on AnnData without subclassing. A registered section gets: - Property accessor (adata.obst) - Axis-aligned storage with validation - Automatic subsetting (adata[:10].obst works) - IO integration (write/read to h5ad and zarr) - Repr discovery (shows in repr output) - Init kwargs (AnnData(obst={...})) Changes: - aligned_mapping.py: AlignedMappingProperty lazily inits backing store - extensions.py: SectionRegistration dataclass + register_aligned_section() - anndata.py: _registered_sections ClassVar, **extra_sections in init, registered sections in _gen_repr - methods.py: write_anndata/read_anndata iterate registered sections - h5ad.py: write_h5ad iterates registered sections

- AlignedMappingProperty.construct sets _attrname_override so registered sections report their own name (e.g., "obst") instead of the default ("obsm") - AlignedView propagates _attrname_override from parent mapping - _mutated_copy includes registered sections in the copy loop - _init_as_actual copies registered sections when init from AnnData - _default_attrname replaces attrname in concrete bases (LayersBase, AxisArraysBase, PairwiseArraysBase) to support the override pattern - Add comprehensive test suite (35 tests) covering storage, validation, subsetting, copy-on-write, IO roundtrip, repr, and TreeData-like workflow

…section New @register_section decorator with: - Alignment as tuple of "obs"/"var" axes: ("obs",), ("obs","var"), ("obs","obs"), ("var","var"), () for unaligned - Custom value_type enforcement - Custom validate/subset/serialize/deserialize methods - Custom repr_entry for HTML repr - Auto-registers SectionFormatter for HTML repr New container classes in section_registry.py: - SectionMapping: validates on assignment (type, alignment, custom) - SectionMappingView: subsets on access, copy-on-write on mutation - SectionProperty: descriptor creating ephemeral containers 45 tests covering all alignment combinations, custom validation, custom IO, subsetting, copy-on-write, init kwargs, TreeData-like and SpatialData-like scenarios.

alignment="obs" is now equivalent to alignment=("obs",). Updated docstring examples and tests to use the string form.

Demonstrates using register_section with custom types: xr.DataArray as layer values with serialize/deserialize for h5ad IO roundtrip. Shows that custom types work end-to-end: storage with type enforcement, alignment validation, subsetting, copy, IO, repr. 6 new tests (51 total).

Support >2D alignment tuples with proper subsetting. anndata's built-in _subset only handles ≤2D, so SectionMappingView implements N-D fancy indexing via np.ix_ for higher dimensions. New biology-motivated test cases: - cellcomm: alignment=("obs", "obs", "var") for ligand-receptor cell-cell communication tensors (CellChat, LIANA, CellPhoneDB) - genereg: alignment=("obs", "var", "var") for cell-specific gene regulatory networks (SCENIC, CellOracle, Dictys) 67 tests total, all passing.

Add TestFactoredTensor: sections store compact rank-R factors (n_obs × rank) and (n_vars × rank), accessor reconstructs the full (obs × obs × var) tensor on demand via einsum. Includes point queries without materializing the tensor. Demonstrates combining register_section (for factor storage with axis-aligned subsetting and IO) with register_anndata_namespace (for the tensor reconstruction API and HTML repr). 73 tests total, all passing. Ruff formatting applied.

Register all built-in sections (X, obs, var, uns, obsm, varm, obsp, varp, layers, raw) in _registered_sections with SectionSpec metadata. Add iter_sections() utility for filtered iteration with options for kind filtering, empty-section skipping. Replace hardcoded section lists in: - _gen_repr: uses iter_sections(exclude_kinds={"X", "raw"}) - _mutated_copy: uses iter_sections(kinds={"dataframe", "mapping"}) - write_h5ad: uses iter_sections(exclude_kinds={"X", "raw"}) - write_anndata: same - read_anndata: iterates _registered_sections.values() The five aligned mapping sections (obsm, varm, obsp, varp, layers), both DataFrames (obs, var), uns, and all extension sections are now discovered from a single registry. Only X and raw retain special handling due to their unique structure.

Raw manages its own subsetting internally (X along obs, var/varm unchanged). The alignment tuple shouldn't imply it behaves like an obs-aligned mapping.

katosh added 12 commits December 12, 2025 16:21

adata.extensions module

6383dd1

Fix Ruff RUF022 on __all__

ee2d8af

unified accessor + section viz pattern

9377ad8

accessor section viz doc_url

8696776

Merge branch 'html_rep' into extensions_register

c6d0f5a

# Conflicts: # tests/test_repr_html.py # tests/visual_inspect_repr_html.py

feat: re-export register_aligned_section from anndata.extensions

23302d9

feat: accept string for single-axis alignment in register_section

1baf049

alignment="obs" is now equivalent to alignment=("obs",). Updated docstring examples and tests to use the string form.

settylab deleted a comment from coderabbitai bot Mar 30, 2026

katosh force-pushed the register_section branch from 38e77af to a40a331 Compare March 30, 2026 20:34

settylab deleted a comment from coderabbitai bot Mar 30, 2026

katosh force-pushed the register_section branch 2 times, most recently from d8d2de9 to 70fe498 Compare March 31, 2026 01:44

katosh force-pushed the register_section branch from 70fe498 to cd2b37d Compare March 31, 2026 01:45

fix: raw alignment should be empty, not ('obs',)

76594d3

Raw manages its own subsetting internally (X along obs, var/varm unchanged). The alignment tuple shouldn't imply it behaves like an obs-aligned mapping.

katosh mentioned this pull request Mar 31, 2026

feat: AnnData.can_write based on AnnData._reduce + iter_outer + refactorings of other relevant functions scverse/anndata#2372

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`@register_section`: pluggable AnnData sections#7

`@register_section`: pluggable AnnData sections#7
katosh wants to merge 15 commits intohtml_repfrom
register_section

katosh commented Mar 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

katosh commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!