@register_section: pluggable AnnData sections#7
Draft
Conversation
# Conflicts: # tests/test_repr_html.py # tests/visual_inspect_repr_html.py
Add register_aligned_section() to anndata.extensions that allows
external packages to register new axis-aligned sections (like obsm,
layers) on AnnData without subclassing.
A registered section gets:
- Property accessor (adata.obst)
- Axis-aligned storage with validation
- Automatic subsetting (adata[:10].obst works)
- IO integration (write/read to h5ad and zarr)
- Repr discovery (shows in repr output)
- Init kwargs (AnnData(obst={...}))
Changes:
- aligned_mapping.py: AlignedMappingProperty lazily inits backing store
- extensions.py: SectionRegistration dataclass + register_aligned_section()
- anndata.py: _registered_sections ClassVar, **extra_sections in init,
registered sections in _gen_repr
- methods.py: write_anndata/read_anndata iterate registered sections
- h5ad.py: write_h5ad iterates registered sections
- AlignedMappingProperty.construct sets _attrname_override so
registered sections report their own name (e.g., "obst") instead
of the default ("obsm")
- AlignedView propagates _attrname_override from parent mapping
- _mutated_copy includes registered sections in the copy loop
- _init_as_actual copies registered sections when init from AnnData
- _default_attrname replaces attrname in concrete bases (LayersBase,
AxisArraysBase, PairwiseArraysBase) to support the override pattern
- Add comprehensive test suite (35 tests) covering storage,
validation, subsetting, copy-on-write, IO roundtrip, repr, and
TreeData-like workflow
…section
New @register_section decorator with:
- Alignment as tuple of "obs"/"var" axes: ("obs",), ("obs","var"),
("obs","obs"), ("var","var"), () for unaligned
- Custom value_type enforcement
- Custom validate/subset/serialize/deserialize methods
- Custom repr_entry for HTML repr
- Auto-registers SectionFormatter for HTML repr
New container classes in section_registry.py:
- SectionMapping: validates on assignment (type, alignment, custom)
- SectionMappingView: subsets on access, copy-on-write on mutation
- SectionProperty: descriptor creating ephemeral containers
45 tests covering all alignment combinations, custom validation,
custom IO, subsetting, copy-on-write, init kwargs, TreeData-like
and SpatialData-like scenarios.
alignment="obs" is now equivalent to alignment=("obs",).
Updated docstring examples and tests to use the string form.
Demonstrates using register_section with custom types: xr.DataArray as layer values with serialize/deserialize for h5ad IO roundtrip. Shows that custom types work end-to-end: storage with type enforcement, alignment validation, subsetting, copy, IO, repr. 6 new tests (51 total).
Support >2D alignment tuples with proper subsetting. anndata's
built-in _subset only handles ≤2D, so SectionMappingView implements
N-D fancy indexing via np.ix_ for higher dimensions.
New biology-motivated test cases:
- cellcomm: alignment=("obs", "obs", "var") for ligand-receptor
cell-cell communication tensors (CellChat, LIANA, CellPhoneDB)
- genereg: alignment=("obs", "var", "var") for cell-specific gene
regulatory networks (SCENIC, CellOracle, Dictys)
67 tests total, all passing.
38e77af to
a40a331
Compare
Add TestFactoredTensor: sections store compact rank-R factors (n_obs × rank) and (n_vars × rank), accessor reconstructs the full (obs × obs × var) tensor on demand via einsum. Includes point queries without materializing the tensor. Demonstrates combining register_section (for factor storage with axis-aligned subsetting and IO) with register_anndata_namespace (for the tensor reconstruction API and HTML repr). 73 tests total, all passing. Ruff formatting applied.
d8d2de9 to
70fe498
Compare
Register all built-in sections (X, obs, var, uns, obsm, varm, obsp,
varp, layers, raw) in _registered_sections with SectionSpec metadata.
Add iter_sections() utility for filtered iteration with options for
kind filtering, empty-section skipping.
Replace hardcoded section lists in:
- _gen_repr: uses iter_sections(exclude_kinds={"X", "raw"})
- _mutated_copy: uses iter_sections(kinds={"dataframe", "mapping"})
- write_h5ad: uses iter_sections(exclude_kinds={"X", "raw"})
- write_anndata: same
- read_anndata: iterates _registered_sections.values()
The five aligned mapping sections (obsm, varm, obsp, varp, layers),
both DataFrames (obs, var), uns, and all extension sections are now
discovered from a single registry. Only X and raw retain special
handling due to their unique structure.
70fe498 to
cd2b37d
Compare
Raw manages its own subsetting internally (X along obs, var/varm unchanged). The alignment tuple shouldn't imply it behaves like an obs-aligned mapping.
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
@register_section: pluggable AnnData sectionsThis PR lets external packages add new sections to AnnData — with storage, validation, subsetting, IO, and repr — using a single decorator. No subclassing needed.
Quick example
That's it. Now every AnnData object has an
obstsection:Why
Packages that extend AnnData (TreeData, SpatialData) currently have to subclass
AnnDataor reimplement its internals. TreeData reimplements the entire write traversal with its own hardcoded section list. The same list is hardcoded in at least four places across anndata (write_h5ad,write_anndata,read_anndata,_gen_repr).@register_sectionmakes all four discoverable.Alignment
The
alignmentparameter declares which AnnData axes each dimension of the stored data is aligned to. This controls both validation (shape must match) and subsetting (which dims get sliced when you doadata[obs_idx, var_idx])."obs""var"("obs", "var")("obs", "obs")("var", "var")()("obs", "obs", "var")("obs", "var", "var")3D tensors for cell-cell communication (CellChat, LIANA, CellPhoneDB):
Cell-specific gene regulatory networks (SCENIC, CellOracle, Dictys):
Custom behavior
All methods are optional. Omit any you don't need.
Validation in action:
xarray DataArray example
Custom types that anndata can't natively serialize work end-to-end via
serialize/deserialize:What you get for free
adata.obst["x"] = arrayadata[:10].obstadata.copy()adata.write("f.h5ad")ad.read_h5ad("f.h5ad")AnnData(obst={...})repr(adata)Scaling 3D tensors: factored storage + accessor
A dense
(n_obs × n_obs × n_vars)tensor is infeasible for large datasets (1M cells × 1M cells × 30K genes ≈ 10^16 entries). The practical pattern is to store compact rank-R factors and reconstruct on demand:This combines
@register_section(factor storage with automatic subsetting and IO) with@register_anndata_namespace(tensor API and point queries). For 1M cells with rank 20, the factors are ~160 MB while the dense tensor would be ~240 TB — a 1,500,000× compression.For moderately-sized datasets,
sparse.COOfrom the PyData sparse package also works directly in registered sections (subsetting handles N-D sparse arrays).iter_sections: centralized section iterationAll built-in sections are registered in
_registered_sectionsalongside extension sections. Theiter_sections()utility provides filtered iteration, replacing the hardcoded section lists that were previously duplicated acrosswrite_h5ad,write_anndata,read_anndata,_gen_repr, and_mutated_copy.Filter by kind:
This is how anndata's own IO now works internally:
Section kinds:
"X","dataframe"(obs/var),"mapping"(obsm/layers/etc. + extensions),"unstructured"(uns),"raw".Also in this PR
@register_anndata_namespace— custom accessor APIs (adata.spatial.images)@register_formatter— custom HTML type/section formattersanndata.extensionsmodule consolidating all extension APIsTest coverage
73 tests covering all alignment patterns, custom validation, custom IO (JSON, xarray), 3D tensor subsetting, factored tensor with accessor, copy-on-write, and end-to-end workflows for TreeData-like, SpatialData-like, CellChat-like, SCENIC-like, and factored communication scenarios.
Future direction
The
alignmenttuple naturally extends to custom axes beyond obs/var. A futureregister_axiscould let packages define new named dimensions with their own indices, enabling N-dimensional indexing likeadata[obs_idx, var_idx, spatial_idx]. This is the conceptual step from DataFrame (2D) to xarray Dataset (N-D) — with@register_sectionas the foundation.