Skip to content

Html rep reduce#6

Draft
katosh wants to merge 25 commits intohtml_repfrom
html_rep_reduce
Draft

Html rep reduce#6
katosh wants to merge 25 commits intohtml_repfrom
html_rep_reduce

Conversation

@katosh
Copy link
Copy Markdown
Collaborator

@katosh katosh commented Mar 23, 2026

Summary

This branch merges ig/fold_can_write (scverse#2372) into html_rep and rewrites the HTML repr's section traversal to use the new AnnData.reduce() API. The goal is to evaluate concretely how well reduce serves as a rendering backbone, and to surface where it helps vs. where it introduces friction.

This is an experiment branch, not intended for merge into main. It exists to inform the API discussion on scverse#2372.

What changed

  1. Shared IO utilities (_io/specs/registry.py): Extracted IORegistry.get_writeable_types() and IORegistry.has_spec() from the logic inside can_write. Both can_write and the repr's serializability checks now use these instead of duplicating registry lookups.

  2. Simplified serializability checks (_repr/utils.py, _repr/formatters.py): _check_serializable_single and _check_array_has_writer now delegate to has_spec() instead of catching exceptions from get_spec().

  3. _render_all_sections rewritten with reduce (_repr/html.py): The core experiment. The SECTION_ORDER loop is replaced by a single adata.reduce() call in DFS-pre order.

What to look for in the code

All pain points are marked with PAIN POINT comments in html.py. Key locations:

The accumulator (_render_via_reduce)

Without enter/exit events, the callback must thread a 4-tuple through every call:

(finished_sections, current_rows, current_section, current_n_items)

current_rows and current_section exist only to track which section is "open" — state that reduce already knows internally but doesn't expose.

Duplicated finalization

Search for _finalize_section. It appears in two places:

  • Inside the callback, triggered when the next section starts (implicitly closing the previous one)
  • After reduce returns, for the last section that never got a successor

Both calls are identical. An on_exit hook would eliminate this duplication entirely.

Section vs. leaf detection

Search for _is_section_visit. The callback must isinstance-check the accessor hierarchy to distinguish parent visits from leaf visits. This is fragile: LayerAcc is used for both X (k=None) and individual layer entries (k='counts'), so a naive isinstance(ref_acc, LayerAcc) treats each layer as a new section instead of an entry within the layers section. The initial implementation had exactly this bug. The fix required adding a special case to exclude LayerAcc entirely from section detection and relying on _is_x to handle the X case separately. reduce knows this distinction internally but doesn't communicate it.

Section metadata (len mismatch)

Search for isinstance(ref_acc, MetaAcc). For obs/var, len(elem) returns the number of rows, not columns. The callback needs a per-type branch to get the correct entry count. The original _render_dataframe_section called len(df.columns) directly.

Opaque uns/raw

Search for isinstance(elem, dict) and isinstance(elem, Raw). reduce passes these as opaque blobs with ref_acc=None and doesn't descend into them. The callback falls back to the dedicated _render_uns_section and _render_raw_section renderers — the same functions the old code used.

Custom and unknown sections

Search for custom_sections_after and _detect_unknown_sections. These are handled entirely outside reduce because reduce only visits the hardcoded attribute list. Extension packages (TreeData adding .obst/.vart) are invisible to it.

Robustness regression

The current per-section rendering is deliberately defensive: each section renders independently, and if one fails, the rest still appear. With reduce, a single problematic element crashes the entire traversal. For example, a non-string DataFrame column name (tuple key in obs) causes the accessor system to throw, which aborts reduce entirely. Instead of showing all other sections with one error indicator, the repr shows nothing but "reduce failed." The try/except around adata.reduce() catches this, but the damage is total rather than localized. See test 23 ("Serialization warnings") in the visual test for a live example.

What reduce simplified

The shared has_spec() / get_writeable_types() utilities are a genuine improvement: both can_write and the repr now use the same registry check instead of each rolling their own. This is independent of whether reduce is used for rendering.

What reduce complicated

The rendering itself became harder to follow. The original code had a clean per-section structure: open section, iterate entries, close section. With reduce, this three-phase logic is flattened into a single callback that must reconstruct section boundaries from accessor types, carry open-section state through the accumulator, and handle the last-section edge case separately.

Section ordering

reduce visits sections in a different order than the repr's SECTION_ORDER:

  • reduce: X, obs, var, obsm, varm, obsp, varp, layers, uns, raw
  • SECTION_ORDER: X, obs, var, uns, obsm, varm, layers, obsp, varp, raw

This branch uses reduce's order. If the original order matters, the output would need post-hoc reordering.

ilan-gold and others added 22 commits March 19, 2026 14:16
Co-authored-by: Philipp A. <flying-sheep@web.de>
Replace _render_all_sections with a reduce-based traversal to
evaluate how well the new reduce API serves as a rendering backbone.

Changes:
- Extract IORegistry.get_writeable_types() and has_spec() utilities
- Simplify can_write, _check_serializable_single, _check_array_has_writer
  to use the shared utilities
- Rewrite _render_all_sections using adata.reduce(DFS-pre)

Pain points documented inline:
- 4-tuple accumulator to track open section state (no enter/exit events)
- Duplicated _finalize_section calls (callback + post-reduce)
- isinstance checks to distinguish section vs leaf visits
- len(DataFrame) returns rows not columns, needs per-type handling
- reduce crashes on non-string column names (tuple keys in obs/var)
- uns/raw passed as opaque blobs, fall back to dedicated renderers
- Custom/unknown sections entirely outside reduce's scope
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 23, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 02cd3745-cd55-450c-ac6b-fa97e3dfd75b

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch html_rep_reduce

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

katosh added 3 commits March 23, 2026 15:28
LayerAcc is used for both X (k=None) and individual layer entries
(k='counts'). The _is_section_visit check matched both, causing
each layer to render as its own section instead of as an entry
within the layers section.

This is another instance of the isinstance fragility documented
in the reduce experiment: the accessor type hierarchy doesn't
cleanly separate parent visits from leaf visits.
get_section_tooltip() already returns the correct tooltips for
obs/var. The hardcoded override was an artifact of porting from
_render_dataframe_section which didn't use get_section_tooltip.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants