chore: promote dev → master (docs reduction pass)#100
Merged
Jaureguy760 merged 1 commit intomasterfrom Apr 19, 2026
Merged
Conversation
) * docs: remove redundant statistical_methods_tutorial notebook The 1,516-line notebook covered beta-binomial PMF, LRT, MLE, and BH — all content already present in methods/statistical_models.rst, methods/dispersion_estimation.rst, and methods/fdr_correction.rst at appropriate depth. The notebook duplicated equations and prose without adding worked examples on real data. Per field norms (scanpy, MACS3, pysam), software docs should defer statistical derivations to the primary literature rather than re-teaching them alongside the API. * docs(methods): shrink fdr_correction from 258 to ~60 lines Keep: BH algorithm pointer, scipy API, PRDS assumption, reporting guidance, output-column table, citations. Cut: FWER/FDR definition primers, BH algorithm derivation, manual BH code block, q-value estimator derivation, discrete-FDR alternative section, threshold-selection table, "when to use stricter control" guidance. Added: NaN-propagation warning (pitfall we've hit before). Added: Storey q-value reference preserved as a citation pointer. Rationale: scanpy, MACS3, and samtools docs cite BH in a sentence; they do not re-teach it. WASP2's LRT produces continuous p-values so BH is standard — users looking for a statistics primer will find one in any multiple-testing textbook. * docs(methods): shrink dispersion_estimation from 270 to ~110 lines Keep: MLE description + single-model code, linear-model description + code, model-choice table, convergence note, references. Cut: Cramer-Rao-bound argument (textbook statistics not WASP2-specific), MoM variance-based estimator (not used in WASP2), MoM vs MLE comparison table, CV-by-sample-size table (unsourced heuristic), generic "sample-size requirements" guidance. Scientific fixes: - Add Kumasaka 2016 (RASQUAL) as primary BB-dispersion reference - Label Robinson 2010 + Yu 2013 as "analogous NB literature" (they are NB, not BB, so they do not directly support BB dispersion) - Add explicit note that rho is held at its null-model MLE when computing the LRT (removes ambiguity about profile vs joint MLE) * docs(methods): shrink + fix statistical_models from 307 to ~150 lines Keep: model definition, LRT formulation, phased/unphased treatment, output-columns table, pseudocount/min-count/aggregation notes. Cut: "Why not binomial" motivation section (covered by 2-sentence intro), variance-inflation numerical example, redundant implementation-section restatement of dispersion code (lives in dispersion_estimation). Scientific fixes (per reviewer pass): - REMOVE the unsupported power table (lines 276-286 of old file). Beta-binomial power depends on rho; the old table varied only mu/N and stated no rho, making the values optimistic for typical genomic dispersion. Replaced with a note that power should be simulated at the dataset's own rho estimate. - ADD Kumasaka 2016 citation (RASQUAL, Nat Genet). WASP2's BB + LRT + pooled-dispersion framework sits directly in RASQUAL's lineage; previous docs cited only Mayba 2014 (MBASED). - ADD van de Geijn 2015 citation (properly placed with the mapping filter pointer). - CLARIFY LRT: rho is held at its null-model MLE when evaluating L_1 (profile likelihood in mu). Removes ambiguity between profile and joint MLE noted by the methods reviewer. * docs(tutorials): shrink comparative_imbalance from 545 to ~175 lines Keep: one worked cell-type example, CLI reference, output-columns table, minimal volcano plot, good practices, common issues. Cut: full duplicate tutorials for sex-differences and treatment-vs- control (same command, different barcode map — one note suffices). Cut: full heatmap code block (links to analysis guide instead). Cut: duplicate Seurat barcode-export R snippet (now single source in user_guide/single_cell.rst). Cut: input-data format over-specification (moved to single_cell guide). Three near-identical tutorial sections collapsed to one example plus "other comparisons use the same command with a different barcode map." * docs(tutorials): consolidate 3 bulk workflow tutorials into one Before: quickstart_mapping.rst (258) + rna_seq.rst (203) + atac_seq_workflow.ipynb (944, orphaned from toctree) = 1405 lines covering overlapping bulk workflows with copy-pasted make-reads / remap / filter-remapped blocks and troubleshooting sections. After: bulk_workflow.rst (~175) covers the full RNA-seq and ATAC-seq bulk pipeline end-to-end (WASP filter + count + analyze) with data-type-specific callouts (GTF vs BED, phased vs unphased). - Deleted: quickstart_mapping.rst, rna_seq.rst, atac_seq_workflow.ipynb - Updated: index.rst toctree, choosing_workflow.rst decision points - No broken cross-references remain (grep-verified). Net: ~1230 lines removed, single canonical bulk walkthrough; scATAC and scRNA tutorials untouched (genuinely distinct workflows). * docs(methods): scientific correctness fixes to mapping_filter + counting_algorithm mapping_filter.rst: - Soften the "Theorem" box that claimed P(map|ref) = P(map|alt) after filtering. The equality is approximate, holds under deterministic-aligner assumptions, and lacks a published proof under that framing. Replaced with an "under the following assumption..." statement and pointer to van de Geijn 2015 §Methods. - Replace the unsourced Rust-vs-Python benchmark table ("1M reads ~5min vs ~30s" etc.) with a qualitative description. The table had no hardware spec, no reproducible script, no dataset — a reviewer would flag it. - Replace the unsourced "Typical Filter Rates by Data Type" table (RNA 5-15%, ATAC 2-8%, ChIP 3-10%, WGS 1-5%) with a qualitative developer-experience paragraph. The tabulated ranges were stated as authoritative without a source. - Add the [vandeGeijn2015]_ reference definition (was cited but not defined; would trigger a Sphinx warning). counting_algorithm.rst: - Replace the unsourced counting-benchmark table (~45s vs 5s, etc.) with a qualitative paragraph. Same issue as the mapping-filter benchmark table. * docs(tutorials): shrink scrna_seq from 333 to ~100 lines Keep: command-line recipe for count → per-celltype imbalance → optional compare, interpretation snippet, troubleshooting for barcode-suffix mismatches and sparsity. Cut: duplicate Seurat + Scanpy barcode-export code blocks (these now live canonically in user_guide/single_cell.rst, referenced by link). Cut: duplicate Cell Ranger output-tree diagram and BAM CB-tag description (repeated in scatac_workflow.rst + user_guide/single_cell). Cut: overlong troubleshooting and next-steps sections. * docs: shrink development.rst from 272 to ~120 lines Keep: setup, code-standards summary, test/mypy commands, Rust layer notes + maturin build recipe, project layout, release flow, branching policy (master/dev promotion, feature-branch-off-dev rule). Cut: generic black/flake8/pytest tutorials (these tools have their own docs), step-by-step PR walkthrough, AI-assisted-development section (link-to-nowhere — seqera_ai_integration doc not served), obsolete "WASP2-exp" repo paths. Added: Rust parity-test requirement, explicit PyPI-OIDC + Docker publish flow, dev → master branching policy learned this session.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Promote `dev` to `master`. Single new commit on dev since the last promotion:
Scientific correctness items fixed include: Kumasaka 2016 citation added, unsupported BB power table removed, unsourced benchmark/filter-rate tables replaced with qualitative language, "Theorem" claim softened, LRT profile-vs-joint ρ clarified, BB vs NB reference mismatch corrected, NaN-handling warning for manual BH added.
No broken cross-references; Sphinx builds clean (only pre-existing autodoc warnings from docstring type annotations, unchanged by this PR).
Test plan