diff --git a/docs/source/choosing_workflow.rst b/docs/source/choosing_workflow.rst index 8d82bd24..f55f2785 100644 --- a/docs/source/choosing_workflow.rst +++ b/docs/source/choosing_workflow.rst @@ -22,11 +22,11 @@ WASP2 supports four major data types. Use this guide to find your workflow. * - **scRNA-seq (10x)** - Cell Ranger BAM + VCF + barcodes - Per-cell or per-cell-type ASE - - :doc:`tutorials/scrna_seq` + - :doc:`tutorials/single_cell_workflow` * - **scATAC-seq (10x)** - Fragments/BAM + VCF + barcodes - Single-cell allelic imbalance in ATAC peaks - - :doc:`tutorials/scatac_workflow` + - :doc:`tutorials/single_cell_workflow` Decision Flowchart ------------------ @@ -39,13 +39,13 @@ Decision Flowchart **Step 2: Bulk or single-cell RNA-seq?** * Bulk RNA-seq → :doc:`tutorials/bulk_workflow` -* 10x Chromium scRNA-seq → :doc:`tutorials/scrna_seq` +* 10x Chromium scRNA-seq → :doc:`tutorials/single_cell_workflow` * Other single-cell protocol → see :doc:`user_guide/single_cell` **Step 3: Bulk or single-cell ATAC-seq?** * Bulk ATAC-seq → :doc:`tutorials/bulk_workflow` (use BED peak file as ``--region``) -* 10x scATAC-seq → :doc:`tutorials/scatac_workflow` +* 10x scATAC-seq → :doc:`tutorials/single_cell_workflow` Do I Need to Run the WASP Remapping Step? ------------------------------------------ diff --git a/docs/source/faq.rst b/docs/source/faq.rst index d824808d..9ff4048f 100644 --- a/docs/source/faq.rst +++ b/docs/source/faq.rst @@ -102,7 +102,7 @@ Any aligner that produces CB-tagged BAMs will work (STARsolo, Alevin-fry, etc.). Run WASP2 on the full BAM to get per-cell allele counts, then use the output with your cell type annotations in Python (AnnData/Scanpy) to aggregate by -cell type. See :doc:`tutorials/scrna_seq` for an example. +cell type. See :doc:`tutorials/single_cell_workflow` for an example. Output and Results ------------------ diff --git a/docs/source/index.rst b/docs/source/index.rst index ab8dde66..67b8c8a6 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -53,7 +53,6 @@ Documentation :caption: Getting Started installation - quickstart choosing_workflow faq @@ -70,10 +69,8 @@ Documentation :maxdepth: 2 :caption: Tutorials - tutorials/quickstart_counting tutorials/bulk_workflow - tutorials/scrna_seq - tutorials/scatac_workflow + tutorials/single_cell_workflow tutorials/comparative_imbalance .. toctree:: diff --git a/docs/source/quickstart.rst b/docs/source/quickstart.rst deleted file mode 100644 index 29730275..00000000 --- a/docs/source/quickstart.rst +++ /dev/null @@ -1,66 +0,0 @@ -Quick Start -=========== - -This 5-minute tutorial demonstrates basic WASP2 usage. - -Prerequisites -------------- - -You will need: - -* A coordinate-sorted, indexed BAM file (``sample.bam`` + ``sample.bam.bai``) -* A phased VCF file with heterozygous variants (``variants.vcf.gz`` + ``.tbi``) - -These are typically produced by your alignment pipeline (BWA-MEM, STAR, etc.) -followed by variant calling and phasing (GATK, WhatsHap, ShapeIt). - -Count Alleles -------------- - -Count allele-specific reads from a BAM file: - -.. code-block:: bash - - wasp2-count count-variants \ - sample.bam \ - variants.vcf.gz \ - -s SAMPLE_ID \ - --out_file counts.tsv - -Output: ``counts.tsv`` with columns: - -* chr, pos, ref, alt -* ref_count, alt_count, other_count - -Analyze Allelic Imbalance --------------------------- - -Detect significant allelic imbalance: - -.. code-block:: bash - - wasp2-analyze find-imbalance \ - counts.tsv \ - --output results.tsv - -Output: ``results.tsv`` with columns: - -* region, ref_count, alt_count -* p-value, FDR-corrected p-value -* Statistical metrics - -Interpret Results ------------------ - -Significant imbalance (FDR < 0.05) indicates: - -* Preferential expression of one allele -* Potential cis-regulatory variation -* Technical artifacts (check coverage) - -Next Steps ----------- - -* :doc:`user_guide/counting` - Detailed counting options -* :doc:`user_guide/mapping` - WASP remapping workflow -* :doc:`user_guide/analysis` - Statistical models diff --git a/docs/source/tutorials/bulk_workflow.rst b/docs/source/tutorials/bulk_workflow.rst index 674f2726..692f3adb 100644 --- a/docs/source/tutorials/bulk_workflow.rst +++ b/docs/source/tutorials/bulk_workflow.rst @@ -170,6 +170,5 @@ See Also - :doc:`/user_guide/analysis` — analysis CLI reference - :doc:`/methods/mapping_filter` — canonical WASP filter contract - :doc:`/methods/statistical_models` — the LRT and beta-binomial model -- :doc:`/tutorials/scatac_workflow` — single-cell ATAC-seq -- :doc:`/tutorials/scrna_seq` — single-cell RNA-seq +- :doc:`/tutorials/single_cell_workflow` — single-cell RNA-seq / ATAC-seq - :doc:`/tutorials/comparative_imbalance` — comparing groups diff --git a/docs/source/tutorials/comparative_imbalance.rst b/docs/source/tutorials/comparative_imbalance.rst index 4324ee22..e03b0e68 100644 --- a/docs/source/tutorials/comparative_imbalance.rst +++ b/docs/source/tutorials/comparative_imbalance.rst @@ -184,5 +184,5 @@ See Also - :doc:`/user_guide/analysis` — analysis-CLI reference and parameters - :doc:`/user_guide/single_cell` — input data formats, barcode exports -- :doc:`/tutorials/scrna_seq` — basic single-cell workflow +- :doc:`/tutorials/single_cell_workflow` — single-cell scRNA/scATAC workflow - :doc:`/methods/statistical_models` — the LRT underlying this test diff --git a/docs/source/tutorials/quickstart_counting.rst b/docs/source/tutorials/quickstart_counting.rst deleted file mode 100644 index 0a042551..00000000 --- a/docs/source/tutorials/quickstart_counting.rst +++ /dev/null @@ -1,151 +0,0 @@ -Quickstart: Count Alleles in 5 Minutes -====================================== - -This tutorial demonstrates the basic WASP2 allele counting workflow using a minimal test dataset. - -**What you'll learn:** - -- How to count allele-specific reads from a BAM file -- Basic WASP2 command-line usage -- Understanding the output format - -**Prerequisites:** - -- WASP2 installed (``pip install wasp2``) -- Basic familiarity with BAM and VCF file formats - -Setup ------ - -First, verify WASP2 is installed: - -.. code-block:: bash - - wasp2-count --version - -Test Data ---------- - -We'll use the minimal test data included in the WASP2 repository: - -- **BAM file**: Synthetic paired-end reads overlapping heterozygous variants -- **VCF file**: 6 variants with genotypes for two samples -- **GTF file**: Gene annotations for 3 genes - -The test data is located in ``pipelines/nf-modules/tests/data/``. - -**VCF contents:** - -.. code-block:: text - - #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1 sample2 - chr1 100 rs1 A G 30 PASS DP=50 GT 0/1 0/0 - chr1 200 rs2 C T 30 PASS DP=45 GT 1/1 0/1 - chr1 300 rs3 G A 30 PASS DP=60 GT 0/0 1/1 - chr1 400 rs4 T C 30 PASS DP=55 GT 0/1 0/1 - chr2 100 rs5 A T 30 PASS DP=40 GT 0/1 0/0 - chr2 200 rs6 G C 30 PASS DP=35 GT ./. 0/1 - -The ``GT`` field shows genotypes: - -- ``0/1``: Heterozygous (has both reference and alternate alleles) -- ``0/0``: Homozygous reference -- ``1/1``: Homozygous alternate - -For allele-specific analysis, we focus on **heterozygous sites** (0/1). - -Step 1: Basic Allele Counting ------------------------------ - -The simplest way to count alleles is to provide a BAM file and VCF file: - -.. code-block:: bash - - wasp2-count count-variants \ - pipelines/nf-modules/tests/data/minimal.bam \ - pipelines/nf-modules/tests/data/sample.vcf.gz \ - --out_file counts_basic.tsv - -**Output:** - -.. code-block:: text - - chr pos ref alt ref_count alt_count other_count - chr1 100 A G 1 0 0 - chr1 400 T C 1 0 0 - chr2 100 A T 1 0 0 - -Output Columns -~~~~~~~~~~~~~~ - -.. list-table:: - :header-rows: 1 - :widths: 20 80 - - * - Column - - Description - * - ``chr`` - - Chromosome - * - ``pos`` - - Variant position (1-based) - * - ``ref`` - - Reference allele - * - ``alt`` - - Alternate allele - * - ``ref_count`` - - Reads supporting reference allele - * - ``alt_count`` - - Reads supporting alternate allele - * - ``other_count`` - - Reads with other alleles (errors, indels) - -Step 2: Filter by Sample ------------------------- - -When your VCF contains multiple samples, use ``--samples`` to filter for heterozygous sites in a specific sample: - -.. code-block:: bash - - wasp2-count count-variants \ - pipelines/nf-modules/tests/data/minimal.bam \ - pipelines/nf-modules/tests/data/sample.vcf.gz \ - --samples sample1 \ - --out_file counts_sample1.tsv - -This returns only the 3 sites where sample1 is heterozygous: - -- chr1:100 (rs1) -- chr1:400 (rs4) -- chr2:100 (rs5) - -Step 3: Annotate with Gene Regions ----------------------------------- - -Use ``--region`` to annotate variants with overlapping genomic features (genes, peaks, etc.): - -.. code-block:: bash - - wasp2-count count-variants \ - pipelines/nf-modules/tests/data/minimal.bam \ - pipelines/nf-modules/tests/data/sample.vcf.gz \ - --samples sample1 \ - --region pipelines/nf-modules/tests/data/sample.gtf \ - --out_file counts_annotated.tsv - -The output now includes gene annotations from the GTF file, allowing you to aggregate counts per gene for downstream analysis. - -Next Steps ----------- - -Now that you have allele counts, you can: - -1. **Analyze allelic imbalance** using ``wasp2-analyze find-imbalance`` -2. **Compare between conditions** using ``wasp2-analyze compare-imbalance`` -3. **Correct mapping bias** using ``wasp2-map`` (for WASP-filtered BAMs) - -See Also --------- - -* :doc:`/user_guide/counting` - Detailed counting options -* :doc:`/tutorials/scrna_seq` - Single-cell RNA-seq tutorial -* :doc:`/tutorials/comparative_imbalance` - Differential imbalance analysis diff --git a/docs/source/tutorials/scatac_workflow.rst b/docs/source/tutorials/scatac_workflow.rst deleted file mode 100644 index 15e8b331..00000000 --- a/docs/source/tutorials/scatac_workflow.rst +++ /dev/null @@ -1,156 +0,0 @@ -Single-Cell ATAC-seq Workflow -============================= - -This tutorial provides a workflow for detecting allelic imbalance in single-cell ATAC-seq data from 10x Genomics. - -.. note:: - - **Estimated Time**: ~30 minutes - -Overview --------- - -**Goal**: Identify genomic regions with allelic imbalance in chromatin accessibility at single-cell resolution. - -**Input Data**: - -* 10x Cell Ranger ATAC output (fragments/BAM + barcodes) -* Phased VCF with heterozygous variants -* Cell type annotations - -Tutorial Sections ------------------ - -1. Loading 10x scATAC Data -~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Cell Ranger ATAC outputs needed: - -.. code-block:: text - - cellranger_output/outs/ - ├── fragments.tsv.gz # Fragment overlap counting - ├── possorted_bam.bam # Allele-specific counting - ├── peaks.bed # Region restriction - └── filtered_peak_bc_matrix/ - └── barcodes.tsv.gz # Filtered barcodes - -2. Cell Barcode Handling -~~~~~~~~~~~~~~~~~~~~~~~~ - -10x barcode format: 16 nucleotides + ``-N`` suffix (e.g., ``AAACGAACAGTCAGTT-1``) - -.. code-block:: bash - - # Verify BAM and barcode file match - samtools view your.bam | head -1000 | grep -o 'CB:Z:[^\t]*' | head - head barcodes.tsv - -3. Counting Strategies -~~~~~~~~~~~~~~~~~~~~~~ - -.. list-table:: - :header-rows: 1 - :widths: 20 40 40 - - * - Aspect - - Per-Cell - - Pseudo-Bulk - * - Resolution - - Single-cell - - Cell population - * - Power - - Low (sparse) - - High (aggregated) - * - Use case - - Outlier cells - - Population imbalance - -**Recommendation**: Use pseudo-bulk for most scATAC experiments. - -.. code-block:: bash - - # Count alleles at heterozygous variants - wasp2-count count-variants-sc \ - possorted_bam.bam \ - variants.vcf.gz \ - barcodes_celltype.tsv \ - --region peaks.bed \ - --samples SAMPLE_ID \ - --out_file allele_counts.h5ad - -**Output**: ``allele_counts.h5ad`` - AnnData with layers: ``X``, ``ref``, ``alt``, ``other`` - -4. Statistical Considerations -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -WASP2 handles sparse data through: - -* **Dispersion model**: Accounts for overdispersion in allele counts -* **Minimum count filters**: ``--min 10`` ensures sufficient data -* **FDR correction**: Benjamini-Hochberg for multiple testing -* **Outlier removal**: ``-z 3`` filters CNV/mapping artifacts - -**Key parameters**: - -* ``--phased``: Use phased genotype information (requires ``0|1`` or ``1|0`` format in VCF) - -5. Visualization -~~~~~~~~~~~~~~~~ - -The notebook includes functions for: - -* Allelic ratio heatmaps -* Volcano plots -* Cell type comparison heatmaps - -6. Cell-Type-Specific Analysis -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -.. code-block:: bash - - # Step 1: Find imbalance within cell types - wasp2-analyze find-imbalance-sc \ - allele_counts.h5ad \ - barcodes_celltype.tsv \ - --sample SAMPLE_ID --phased --min 10 -z 3 - # Output: ai_results_.tsv per cell type - - # Step 2: Compare between cell types - wasp2-analyze compare-imbalance \ - allele_counts.h5ad \ - barcodes_celltype.tsv \ - --sample SAMPLE_ID --groups "CellTypeA,CellTypeB" --phased - # Output: ai_results__.tsv - -**Output columns**: region, ref_count, alt_count, p_value, fdr_pval, effect_size - -Troubleshooting ---------------- - -No Barcodes Matched -~~~~~~~~~~~~~~~~~~~ - -.. code-block:: bash - - # Add -1 suffix if missing - awk -F'\t' '{print $1"-1\t"$2}' barcodes_no_suffix.tsv > barcodes.tsv - -Memory Issues -~~~~~~~~~~~~~ - -Process chromosomes separately with ``--region peaks_chr1.bed``. - -Low Power -~~~~~~~~~ - -* Merge similar cell types -* Use pseudo-bulk aggregation -* Ensure phased genotypes - -See Also --------- - -* :doc:`/tutorials/scrna_seq` - 10X scRNA-seq tutorial -* :doc:`/tutorials/comparative_imbalance` - Comparative analysis -* :doc:`/user_guide/single_cell` - Data format reference diff --git a/docs/source/tutorials/scrna_seq.rst b/docs/source/tutorials/scrna_seq.rst deleted file mode 100644 index 35bdb071..00000000 --- a/docs/source/tutorials/scrna_seq.rst +++ /dev/null @@ -1,112 +0,0 @@ -10X scRNA-seq Tutorial -====================== - -End-to-end allele-specific expression (ASE) workflow for 10X Chromium -scRNA-seq. Assumes a Cell Ranger ``possorted_genome_bam.bam`` with cell -barcodes in the ``CB`` tag, a phased VCF for the donor, and a cell-type -annotation from Seurat or Scanpy. - -Inputs ------- - -- Cell Ranger BAM + index -- Phased VCF/BCF/PGEN for the sample -- A barcode-to-group TSV (see :doc:`/user_guide/single_cell` for Seurat / - Scanpy export code and exact format) -- GTF gene annotation - -Step 1 — Count alleles per cell per gene ------------------------------------------ - -.. code-block:: bash - - wasp2-count count-variants-sc \ - cellranger_output/outs/possorted_genome_bam.bam \ - phased_variants.vcf.gz \ - barcodes_celltype.tsv \ - --feature genes.gtf \ - --samples SAMPLE_ID \ - --out_file allele_counts.h5ad - -Output: an AnnData ``.h5ad`` with ``ref`` / ``alt`` / ``other`` layers, -genotype columns in ``.obs``, and cell-type assignments in ``.var``. See -:doc:`/user_guide/single_cell` for the full schema. - -Step 2 — Per-cell-type imbalance --------------------------------- - -.. code-block:: bash - - wasp2-analyze find-imbalance-sc \ - allele_counts.h5ad \ - barcodes_celltype.tsv \ - --sample SAMPLE_ID \ - --out_file imbalance_by_celltype.tsv - -Output columns: ``region``, ``cell_type``, aggregated ``ref_count`` / -``alt_count``, ``pval``, ``fdr_pval``, ``effect_size`` (log₂ ref/alt). - -Step 3 — Compare cell types (optional) --------------------------------------- - -.. code-block:: bash - - wasp2-analyze compare-imbalance \ - allele_counts.h5ad \ - barcodes_celltype.tsv \ - --groups "CD4_T_cell,CD8_T_cell" \ - --out_file differential_imbalance.tsv - -For more on comparative analysis (multiple groups, all-pairs, volcano -plots), see :doc:`comparative_imbalance`. - -Interpreting results --------------------- - -.. code-block:: python - - import pandas as pd - - results = pd.read_csv('imbalance_by_celltype.tsv', sep='\t') - sig = results[results['fdr_pval'] < 0.05] - - top = (sig.groupby('cell_type') - .apply(lambda x: x.nsmallest(10, 'fdr_pval')) - .reset_index(drop=True)) - - print(top[['region', 'cell_type', 'effect_size', 'fdr_pval']]) - -Troubleshooting ---------------- - -**Zero barcodes matched.** Confirm barcode format in the BAM vs. the TSV — -the ``CB:Z:...`` tag often has a ``-1`` suffix that your export must match: - -.. code-block:: bash - - samtools view your.bam | head -10000 | grep -o 'CB:Z:[^[:space:]]*' \ - | cut -d: -f3 | sort -u > bam_bc.txt - cut -f1 barcodes.tsv | sort -u > file_bc.txt - comm -12 bam_bc.txt file_bc.txt | wc -l # should be > 0 - -Fix a missing suffix with: - -.. code-block:: bash - - awk -F'\t' '{print $1"-1\t"$2}' barcodes_no_suffix.tsv > barcodes.tsv - -**Sparse counts.** Single-cell data is sparse. Consider pseudobulk -aggregation by cell type, lower ``--min`` / ``--min_count``, or focus on -highly expressed genes. - -**Memory.** For large cohorts, split the region file by chromosome and -process chunks, then concatenate results. - -See Also --------- - -- :doc:`/user_guide/single_cell` — barcode file format, Seurat/Scanpy export -- :doc:`/user_guide/analysis` — analysis CLI reference -- :doc:`/methods/statistical_models` — beta-binomial LRT -- :doc:`scatac_workflow` — sibling tutorial for scATAC-seq -- :doc:`comparative_imbalance` — comparing groups diff --git a/docs/source/tutorials/single_cell_workflow.rst b/docs/source/tutorials/single_cell_workflow.rst new file mode 100644 index 00000000..b11843a2 --- /dev/null +++ b/docs/source/tutorials/single_cell_workflow.rst @@ -0,0 +1,165 @@ +Single-Cell Workflow (scRNA-seq / scATAC-seq) +============================================== + +End-to-end allele-specific workflow for single-cell data — 10X Chromium +scRNA-seq and 10X scATAC-seq. Pipeline is the same in both cases; the +data-type difference shows up as GTF (for scRNA-seq genes) vs. BED (for +scATAC-seq peaks) in the ``--feature`` argument. + +Inputs +------ + +- Cell Ranger BAM with cell barcodes in the ``CB:Z:...`` tag + index +- Phased VCF/BCF/PGEN for the donor +- Barcode-to-group TSV (cell type or other assignment — see + :doc:`/user_guide/single_cell` for Seurat/Scanpy export code and format) +- **scRNA-seq**: GTF gene annotation +- **scATAC-seq**: BED peak file (usually from Cell Ranger + ``filtered_peak_bc_matrix`` or a consensus peak set) + +Step 1 — Count alleles per cell +-------------------------------- + +**scRNA-seq (genes):** + +.. code-block:: bash + + wasp2-count count-variants-sc \ + cellranger_output/outs/possorted_genome_bam.bam \ + phased_variants.vcf.gz \ + barcodes_celltype.tsv \ + --feature genes.gtf \ + --samples SAMPLE_ID \ + --out_file allele_counts.h5ad + +**scATAC-seq (peaks):** + +.. code-block:: bash + + wasp2-count count-variants-sc \ + cellranger_output/outs/possorted_bam.bam \ + phased_variants.vcf.gz \ + barcodes_celltype.tsv \ + --feature peaks.bed \ + --samples SAMPLE_ID \ + --out_file allele_counts.h5ad + +Output: an AnnData ``.h5ad`` with ``ref`` / ``alt`` / ``other`` layers, +genotype columns in ``.obs``, and cell-type assignments in ``.var``. See +:doc:`/user_guide/single_cell` for the full schema. + +Step 2 — Per-group imbalance +---------------------------- + +.. code-block:: bash + + wasp2-analyze find-imbalance-sc \ + allele_counts.h5ad \ + barcodes_celltype.tsv \ + --sample SAMPLE_ID \ + --phased --min 10 -z 3 \ + --out_file imbalance_by_celltype.tsv + +Output columns: ``region``, ``cell_type``, aggregated ``ref_count`` / +``alt_count``, ``pval``, ``fdr_pval``, ``effect_size`` (log₂ ref/alt). + +Step 3 — Compare groups (optional) +----------------------------------- + +.. code-block:: bash + + wasp2-analyze compare-imbalance \ + allele_counts.h5ad \ + barcodes_celltype.tsv \ + --groups "CD4_T_cell,CD8_T_cell" \ + --phased \ + --out_file differential_imbalance.tsv + +For multi-group and all-pairs comparisons, visualization, and +interpretation, see :doc:`comparative_imbalance`. + +Per-cell vs. pseudo-bulk +------------------------ + +Single-cell ATAC data is especially sparse — most cells contribute zero +reads to most peaks. Two analysis modes are common: + +.. list-table:: + :header-rows: 1 + :widths: 20 40 40 + + * - Aspect + - Per-cell + - Pseudo-bulk (per-cell-type) + * - Resolution + - Single cell + - Cell population + * - Power + - Low (sparse) + - High (aggregated) + * - Use case + - Outlier cells + - Population-level imbalance + +Pseudo-bulk (the default, via the barcode-to-group TSV) is the right +starting point for most scATAC experiments. Per-cell analysis is useful +when investigating rare subpopulations or outlier effects. + +Interpreting results +-------------------- + +.. code-block:: python + + import pandas as pd + + results = pd.read_csv('imbalance_by_celltype.tsv', sep='\t') + sig = results[results['fdr_pval'] < 0.05] + + top = (sig.groupby('cell_type') + .apply(lambda x: x.nsmallest(10, 'fdr_pval')) + .reset_index(drop=True)) + + print(top[['region', 'cell_type', 'effect_size', 'fdr_pval']]) + +Troubleshooting +--------------- + +**Zero barcodes matched.** Confirm barcode format in the BAM vs. the TSV — +the ``CB:Z:...`` tag often has a ``-1`` suffix that your export must match: + +.. code-block:: bash + + samtools view your.bam | head -10000 | grep -o 'CB:Z:[^[:space:]]*' \ + | cut -d: -f3 | sort -u > bam_bc.txt + cut -f1 barcodes.tsv | sort -u > file_bc.txt + comm -12 bam_bc.txt file_bc.txt | wc -l # should be > 0 + +Fix a missing suffix: + +.. code-block:: bash + + awk -F'\t' '{print $1"-1\t"$2}' barcodes_no_suffix.tsv > barcodes.tsv + +**Sparse counts / low power.** Aggregate to pseudo-bulk by cell type, +lower ``--min`` / ``--min_count``, or focus on highly expressed genes +(scRNA-seq) / high-coverage peaks (scATAC-seq). + +**Memory.** For large cohorts, split the feature file by chromosome and +process chunks: + +.. code-block:: bash + + for chr in chr{1..22}; do + grep "^${chr}\s" peaks.bed > peaks_${chr}.bed + wasp2-count count-variants-sc sample.bam variants.vcf.gz barcodes.tsv \ + --feature peaks_${chr}.bed --out_file counts_${chr}.h5ad + done + +See Also +-------- + +- :doc:`/user_guide/single_cell` — barcode format, Seurat/Scanpy export +- :doc:`/user_guide/analysis` — analysis CLI reference +- :doc:`/methods/statistical_models` — beta-binomial LRT +- :doc:`bulk_workflow` — sibling tutorial for bulk RNA-seq / ATAC-seq +- :doc:`comparative_imbalance` — comparing groups diff --git a/docs/source/user_guide/single_cell.rst b/docs/source/user_guide/single_cell.rst index 7ab45708..a9bb9e2b 100644 --- a/docs/source/user_guide/single_cell.rst +++ b/docs/source/user_guide/single_cell.rst @@ -380,6 +380,6 @@ See Also -------- -* :doc:`/tutorials/scrna_seq` - Complete 10X scRNA-seq tutorial +* :doc:`/tutorials/single_cell_workflow` - scRNA-seq / scATAC-seq workflow * :doc:`analysis` - Statistical analysis methods * :doc:`counting` - General allele counting