Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@

All notable changes to WASP2 will be documented in this file.

## [Unreleased]

### Changed
- Restored canonical WASP filter contract in Rust counting path. Removed 6 SAM flag filters (`is_secondary`, `is_supplementary`, `is_duplicate`, `is_qcfail`, `!is_proper_pair`) introduced in the 1.2.0 migration (commit a72ffba) across `rust/src/bam_counter.rs`, `rust/src/mapping_filter.rs`, and `rust/src/bam_filter.rs`. Retained `is_unmapped` for crash safety. Restores byte-level parity with the pre-1.3.0 Python counter on STAR-aligned RNA (0/17,728 rows differ on reference donor) and within 0.10% on BWA-aligned ATAC. Callers who relied on the defensive filters should pre-filter BAMs upstream (e.g., `samtools view -F 0x904`). Aligns behavior with the canonical WASP documentation that BAM pre-filtering is the caller's responsibility (van de Geijn et al. 2015, Nat Methods 10.1038/nmeth.3582).
- `tests/test_rust_python_counting_parity.py` Python reference updated to match the new filter contract (only `unmapped` skipped).

## [1.4.0] - 2026-02-25

### Fixed
Expand Down
6 changes: 1 addition & 5 deletions rust/src/bam_counter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -238,11 +238,7 @@ impl BamCounter {
continue;
}
};
if record.is_unmapped()
|| record.is_secondary()
|| record.is_supplementary()
|| record.is_duplicate()
{
if record.is_unmapped() {
continue;
}
let qname = record.qname().to_vec();
Expand Down
4 changes: 1 addition & 3 deletions rust/src/bam_filter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -124,9 +124,7 @@ fn phase2_collect_remap_names(
result?;
processed += 1;

// Skip unmapped, secondary, supplementary, QC fail, duplicate
// Flags: 0x4=unmapped, 0x100=secondary, 0x800=supplementary, 0x200=QC fail, 0x400=duplicate
if read.flags() & (0x4 | 0x100 | 0x800 | 0x200 | 0x400) != 0 {
if read.flags() & 0x4 != 0 {
continue;
}

Expand Down
6 changes: 1 addition & 5 deletions rust/src/mapping_filter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -223,11 +223,7 @@ pub fn filter_bam_wasp(
Ok(r) => r,
Err(_) => continue,
};
if rec.is_unmapped()
|| !rec.is_proper_pair()
|| rec.is_secondary()
|| rec.is_supplementary()
{
if rec.is_unmapped() {
continue;
}

Expand Down
2 changes: 1 addition & 1 deletion src/mapping/intersect_variant_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ def make_intersect_df(
intersect_file: str,
samples: list[str],
is_paired: bool = True,
) -> "pl.DataFrame":
) -> pl.DataFrame:
"""Parse intersection file into a typed polars DataFrame.

Parameters
Expand Down
7 changes: 4 additions & 3 deletions tests/test_rust_python_counting_parity.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@
- Single BAM fetch per chromosome spanning all variant positions
- Each read assigned to the earliest-encounter-index SNP it overlaps
- Seen-reads set accumulates across positions (each read counted once)
- BAM flag filtering: unmapped, secondary, supplementary, duplicate skipped
- BAM flag filtering: only unmapped skipped (canonical WASP contract —
caller is responsible for BAM pre-filtering)

This catches any numerical differences between the Rust and Python
implementations that golden-file tests (which compare Rust output to
Expand Down Expand Up @@ -39,7 +40,7 @@ def python_count_snp_alleles(bam_path: str, chrom: str, snp_list: list[tuple]):
aligned base at. A read is counted at exactly one position.
3. Seen-reads set accumulates across all positions (paired-end mates
and multi-overlap reads are counted only once per chromosome).
4. BAM flag filtering: skip unmapped, secondary, supplementary, duplicate.
4. BAM flag filtering: skip only unmapped (canonical WASP contract).

Args:
bam_path: Path to BAM file.
Expand Down Expand Up @@ -67,7 +68,7 @@ def python_count_snp_alleles(bam_path: str, chrom: str, snp_list: list[tuple]):

with pysam.AlignmentFile(bam_path, "rb") as bam:
for read in bam.fetch(chrom, max(0, min_pos - 1), max_pos + 1):
if read.is_unmapped or read.is_secondary or read.is_supplementary or read.is_duplicate:
if read.is_unmapped:
continue

qname = read.query_name
Expand Down
Loading