chore: promote dev → master (docs reduction pass) by Jaureguy760 · Pull Request #100 · mcvickerlab/WASP2

Jaureguy760 · 2026-04-19T02:42:43Z

Summary

Promote `dev` to `master`. Single new commit on dev since the last promotion:

cea62f4 `docs: reduce text volume (~47%) + fix scientific correctness issues (docs: reduce text volume (~47%) + fix scientific correctness issues #99)` — 46.7% reduction in `docs/source/` (~6,300 → ~3,360 lines), 9 commits consolidated, 8 scientific-correctness fixes applied.

Scientific correctness items fixed include: Kumasaka 2016 citation added, unsupported BB power table removed, unsourced benchmark/filter-rate tables replaced with qualitative language, "Theorem" claim softened, LRT profile-vs-joint ρ clarified, BB vs NB reference mismatch corrected, NaN-handling warning for manual BH added.

No broken cross-references; Sphinx builds clean (only pre-existing autodoc warnings from docstring type annotations, unchanged by this PR).

Test plan

PR docs: reduce text volume (~47%) + fix scientific correctness issues #99 → dev CI all-green (ruff, Rust tests, pytest 3.10/3.11/3.12)
Local Sphinx build passes; no new warnings introduced
Master CI on merge commit

) * docs: remove redundant statistical_methods_tutorial notebook The 1,516-line notebook covered beta-binomial PMF, LRT, MLE, and BH — all content already present in methods/statistical_models.rst, methods/dispersion_estimation.rst, and methods/fdr_correction.rst at appropriate depth. The notebook duplicated equations and prose without adding worked examples on real data. Per field norms (scanpy, MACS3, pysam), software docs should defer statistical derivations to the primary literature rather than re-teaching them alongside the API. * docs(methods): shrink fdr_correction from 258 to ~60 lines Keep: BH algorithm pointer, scipy API, PRDS assumption, reporting guidance, output-column table, citations. Cut: FWER/FDR definition primers, BH algorithm derivation, manual BH code block, q-value estimator derivation, discrete-FDR alternative section, threshold-selection table, "when to use stricter control" guidance. Added: NaN-propagation warning (pitfall we've hit before). Added: Storey q-value reference preserved as a citation pointer. Rationale: scanpy, MACS3, and samtools docs cite BH in a sentence; they do not re-teach it. WASP2's LRT produces continuous p-values so BH is standard — users looking for a statistics primer will find one in any multiple-testing textbook. * docs(methods): shrink dispersion_estimation from 270 to ~110 lines Keep: MLE description + single-model code, linear-model description + code, model-choice table, convergence note, references. Cut: Cramer-Rao-bound argument (textbook statistics not WASP2-specific), MoM variance-based estimator (not used in WASP2), MoM vs MLE comparison table, CV-by-sample-size table (unsourced heuristic), generic "sample-size requirements" guidance. Scientific fixes: - Add Kumasaka 2016 (RASQUAL) as primary BB-dispersion reference - Label Robinson 2010 + Yu 2013 as "analogous NB literature" (they are NB, not BB, so they do not directly support BB dispersion) - Add explicit note that rho is held at its null-model MLE when computing the LRT (removes ambiguity about profile vs joint MLE) * docs(methods): shrink + fix statistical_models from 307 to ~150 lines Keep: model definition, LRT formulation, phased/unphased treatment, output-columns table, pseudocount/min-count/aggregation notes. Cut: "Why not binomial" motivation section (covered by 2-sentence intro), variance-inflation numerical example, redundant implementation-section restatement of dispersion code (lives in dispersion_estimation). Scientific fixes (per reviewer pass): - REMOVE the unsupported power table (lines 276-286 of old file). Beta-binomial power depends on rho; the old table varied only mu/N and stated no rho, making the values optimistic for typical genomic dispersion. Replaced with a note that power should be simulated at the dataset's own rho estimate. - ADD Kumasaka 2016 citation (RASQUAL, Nat Genet). WASP2's BB + LRT + pooled-dispersion framework sits directly in RASQUAL's lineage; previous docs cited only Mayba 2014 (MBASED). - ADD van de Geijn 2015 citation (properly placed with the mapping filter pointer). - CLARIFY LRT: rho is held at its null-model MLE when evaluating L_1 (profile likelihood in mu). Removes ambiguity between profile and joint MLE noted by the methods reviewer. * docs(tutorials): shrink comparative_imbalance from 545 to ~175 lines Keep: one worked cell-type example, CLI reference, output-columns table, minimal volcano plot, good practices, common issues. Cut: full duplicate tutorials for sex-differences and treatment-vs- control (same command, different barcode map — one note suffices). Cut: full heatmap code block (links to analysis guide instead). Cut: duplicate Seurat barcode-export R snippet (now single source in user_guide/single_cell.rst). Cut: input-data format over-specification (moved to single_cell guide). Three near-identical tutorial sections collapsed to one example plus "other comparisons use the same command with a different barcode map." * docs(tutorials): consolidate 3 bulk workflow tutorials into one Before: quickstart_mapping.rst (258) + rna_seq.rst (203) + atac_seq_workflow.ipynb (944, orphaned from toctree) = 1405 lines covering overlapping bulk workflows with copy-pasted make-reads / remap / filter-remapped blocks and troubleshooting sections. After: bulk_workflow.rst (~175) covers the full RNA-seq and ATAC-seq bulk pipeline end-to-end (WASP filter + count + analyze) with data-type-specific callouts (GTF vs BED, phased vs unphased). - Deleted: quickstart_mapping.rst, rna_seq.rst, atac_seq_workflow.ipynb - Updated: index.rst toctree, choosing_workflow.rst decision points - No broken cross-references remain (grep-verified). Net: ~1230 lines removed, single canonical bulk walkthrough; scATAC and scRNA tutorials untouched (genuinely distinct workflows). * docs(methods): scientific correctness fixes to mapping_filter + counting_algorithm mapping_filter.rst: - Soften the "Theorem" box that claimed P(map|ref) = P(map|alt) after filtering. The equality is approximate, holds under deterministic-aligner assumptions, and lacks a published proof under that framing. Replaced with an "under the following assumption..." statement and pointer to van de Geijn 2015 §Methods. - Replace the unsourced Rust-vs-Python benchmark table ("1M reads ~5min vs ~30s" etc.) with a qualitative description. The table had no hardware spec, no reproducible script, no dataset — a reviewer would flag it. - Replace the unsourced "Typical Filter Rates by Data Type" table (RNA 5-15%, ATAC 2-8%, ChIP 3-10%, WGS 1-5%) with a qualitative developer-experience paragraph. The tabulated ranges were stated as authoritative without a source. - Add the [vandeGeijn2015]_ reference definition (was cited but not defined; would trigger a Sphinx warning). counting_algorithm.rst: - Replace the unsourced counting-benchmark table (~45s vs 5s, etc.) with a qualitative paragraph. Same issue as the mapping-filter benchmark table. * docs(tutorials): shrink scrna_seq from 333 to ~100 lines Keep: command-line recipe for count → per-celltype imbalance → optional compare, interpretation snippet, troubleshooting for barcode-suffix mismatches and sparsity. Cut: duplicate Seurat + Scanpy barcode-export code blocks (these now live canonically in user_guide/single_cell.rst, referenced by link). Cut: duplicate Cell Ranger output-tree diagram and BAM CB-tag description (repeated in scatac_workflow.rst + user_guide/single_cell). Cut: overlong troubleshooting and next-steps sections. * docs: shrink development.rst from 272 to ~120 lines Keep: setup, code-standards summary, test/mypy commands, Rust layer notes + maturin build recipe, project layout, release flow, branching policy (master/dev promotion, feature-branch-off-dev rule). Cut: generic black/flake8/pytest tutorials (these tools have their own docs), step-by-step PR walkthrough, AI-assisted-development section (link-to-nowhere — seqera_ai_integration doc not served), obsolete "WASP2-exp" repo paths. Added: Rust parity-test requirement, explicit PyPI-OIDC + Docker publish flow, dev → master branching policy learned this session.

Jaureguy760 merged commit 6ae10c7 into master Apr 19, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: promote dev → master (docs reduction pass)#100

chore: promote dev → master (docs reduction pass)#100
Jaureguy760 merged 1 commit intomasterfrom
dev

Jaureguy760 commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Jaureguy760 commented Apr 19, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant