Skip to content

chore: promote dev → master (docs reduction pass)#100

Merged
Jaureguy760 merged 1 commit intomasterfrom
dev
Apr 19, 2026
Merged

chore: promote dev → master (docs reduction pass)#100
Jaureguy760 merged 1 commit intomasterfrom
dev

Conversation

@Jaureguy760
Copy link
Copy Markdown
Collaborator

Summary

Promote `dev` to `master`. Single new commit on dev since the last promotion:

Scientific correctness items fixed include: Kumasaka 2016 citation added, unsupported BB power table removed, unsourced benchmark/filter-rate tables replaced with qualitative language, "Theorem" claim softened, LRT profile-vs-joint ρ clarified, BB vs NB reference mismatch corrected, NaN-handling warning for manual BH added.

No broken cross-references; Sphinx builds clean (only pre-existing autodoc warnings from docstring type annotations, unchanged by this PR).

Test plan

)

* docs: remove redundant statistical_methods_tutorial notebook

The 1,516-line notebook covered beta-binomial PMF, LRT, MLE, and
BH — all content already present in methods/statistical_models.rst,
methods/dispersion_estimation.rst, and methods/fdr_correction.rst
at appropriate depth. The notebook duplicated equations and prose
without adding worked examples on real data.

Per field norms (scanpy, MACS3, pysam), software docs should defer
statistical derivations to the primary literature rather than
re-teaching them alongside the API.

* docs(methods): shrink fdr_correction from 258 to ~60 lines

Keep: BH algorithm pointer, scipy API, PRDS assumption, reporting guidance,
output-column table, citations.

Cut: FWER/FDR definition primers, BH algorithm derivation, manual BH code
block, q-value estimator derivation, discrete-FDR alternative section,
threshold-selection table, "when to use stricter control" guidance.

Added: NaN-propagation warning (pitfall we've hit before).
Added: Storey q-value reference preserved as a citation pointer.

Rationale: scanpy, MACS3, and samtools docs cite BH in a sentence; they do
not re-teach it. WASP2's LRT produces continuous p-values so BH is standard —
users looking for a statistics primer will find one in any multiple-testing
textbook.

* docs(methods): shrink dispersion_estimation from 270 to ~110 lines

Keep: MLE description + single-model code, linear-model description +
code, model-choice table, convergence note, references.

Cut: Cramer-Rao-bound argument (textbook statistics not WASP2-specific),
MoM variance-based estimator (not used in WASP2), MoM vs MLE comparison
table, CV-by-sample-size table (unsourced heuristic), generic
"sample-size requirements" guidance.

Scientific fixes:
- Add Kumasaka 2016 (RASQUAL) as primary BB-dispersion reference
- Label Robinson 2010 + Yu 2013 as "analogous NB literature" (they
  are NB, not BB, so they do not directly support BB dispersion)
- Add explicit note that rho is held at its null-model MLE when
  computing the LRT (removes ambiguity about profile vs joint MLE)

* docs(methods): shrink + fix statistical_models from 307 to ~150 lines

Keep: model definition, LRT formulation, phased/unphased treatment,
output-columns table, pseudocount/min-count/aggregation notes.

Cut: "Why not binomial" motivation section (covered by 2-sentence intro),
variance-inflation numerical example, redundant implementation-section
restatement of dispersion code (lives in dispersion_estimation).

Scientific fixes (per reviewer pass):

- REMOVE the unsupported power table (lines 276-286 of old file).
  Beta-binomial power depends on rho; the old table varied only mu/N
  and stated no rho, making the values optimistic for typical genomic
  dispersion. Replaced with a note that power should be simulated at
  the dataset's own rho estimate.

- ADD Kumasaka 2016 citation (RASQUAL, Nat Genet). WASP2's BB + LRT +
  pooled-dispersion framework sits directly in RASQUAL's lineage;
  previous docs cited only Mayba 2014 (MBASED).

- ADD van de Geijn 2015 citation (properly placed with the mapping
  filter pointer).

- CLARIFY LRT: rho is held at its null-model MLE when evaluating
  L_1 (profile likelihood in mu). Removes ambiguity between profile
  and joint MLE noted by the methods reviewer.

* docs(tutorials): shrink comparative_imbalance from 545 to ~175 lines

Keep: one worked cell-type example, CLI reference, output-columns
table, minimal volcano plot, good practices, common issues.

Cut: full duplicate tutorials for sex-differences and treatment-vs-
control (same command, different barcode map — one note suffices).
Cut: full heatmap code block (links to analysis guide instead).
Cut: duplicate Seurat barcode-export R snippet (now single source in
user_guide/single_cell.rst).
Cut: input-data format over-specification (moved to single_cell guide).

Three near-identical tutorial sections collapsed to one example plus
"other comparisons use the same command with a different barcode map."

* docs(tutorials): consolidate 3 bulk workflow tutorials into one

Before: quickstart_mapping.rst (258) + rna_seq.rst (203) +
atac_seq_workflow.ipynb (944, orphaned from toctree) = 1405 lines
covering overlapping bulk workflows with copy-pasted make-reads /
remap / filter-remapped blocks and troubleshooting sections.

After: bulk_workflow.rst (~175) covers the full RNA-seq and ATAC-seq
bulk pipeline end-to-end (WASP filter + count + analyze) with
data-type-specific callouts (GTF vs BED, phased vs unphased).

- Deleted: quickstart_mapping.rst, rna_seq.rst, atac_seq_workflow.ipynb
- Updated: index.rst toctree, choosing_workflow.rst decision points
- No broken cross-references remain (grep-verified).

Net: ~1230 lines removed, single canonical bulk walkthrough; scATAC
and scRNA tutorials untouched (genuinely distinct workflows).

* docs(methods): scientific correctness fixes to mapping_filter + counting_algorithm

mapping_filter.rst:

- Soften the "Theorem" box that claimed P(map|ref) = P(map|alt)
  after filtering. The equality is approximate, holds under
  deterministic-aligner assumptions, and lacks a published proof
  under that framing. Replaced with an "under the following
  assumption..." statement and pointer to van de Geijn 2015 §Methods.

- Replace the unsourced Rust-vs-Python benchmark table ("1M reads
  ~5min vs ~30s" etc.) with a qualitative description. The table
  had no hardware spec, no reproducible script, no dataset — a
  reviewer would flag it.

- Replace the unsourced "Typical Filter Rates by Data Type" table
  (RNA 5-15%, ATAC 2-8%, ChIP 3-10%, WGS 1-5%) with a qualitative
  developer-experience paragraph. The tabulated ranges were stated
  as authoritative without a source.

- Add the [vandeGeijn2015]_ reference definition (was cited but
  not defined; would trigger a Sphinx warning).

counting_algorithm.rst:

- Replace the unsourced counting-benchmark table (~45s vs 5s, etc.)
  with a qualitative paragraph. Same issue as the mapping-filter
  benchmark table.

* docs(tutorials): shrink scrna_seq from 333 to ~100 lines

Keep: command-line recipe for count → per-celltype imbalance →
optional compare, interpretation snippet, troubleshooting for
barcode-suffix mismatches and sparsity.

Cut: duplicate Seurat + Scanpy barcode-export code blocks (these
now live canonically in user_guide/single_cell.rst, referenced by
link).

Cut: duplicate Cell Ranger output-tree diagram and BAM CB-tag
description (repeated in scatac_workflow.rst + user_guide/single_cell).
Cut: overlong troubleshooting and next-steps sections.

* docs: shrink development.rst from 272 to ~120 lines

Keep: setup, code-standards summary, test/mypy commands, Rust layer
notes + maturin build recipe, project layout, release flow, branching
policy (master/dev promotion, feature-branch-off-dev rule).

Cut: generic black/flake8/pytest tutorials (these tools have their
own docs), step-by-step PR walkthrough, AI-assisted-development
section (link-to-nowhere — seqera_ai_integration doc not served),
obsolete "WASP2-exp" repo paths.

Added: Rust parity-test requirement, explicit PyPI-OIDC + Docker
publish flow, dev → master branching policy learned this session.
@Jaureguy760 Jaureguy760 merged commit 6ae10c7 into master Apr 19, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant