rust: restore canonical WASP filter contract (van de Geijn 2015)#96
Merged
Jaureguy760 merged 4 commits intomasterfrom Apr 19, 2026
Merged
rust: restore canonical WASP filter contract (van de Geijn 2015)#96Jaureguy760 merged 4 commits intomasterfrom
Jaureguy760 merged 4 commits intomasterfrom
Conversation
Remove 6 SAM flag filters added in 1.2.0 migration (commit a72ffba) so WASP2 counting respects pre-filtered BAMs. This realigns behavior with the canonical WASP documented contract that callers are responsible for upstream BAM filtering (bmvdgeijn/WASP CHT/bam2h5.py: "This program does not perform filtering of reads based on mappability. It is assumed that the input BAM files are filtered appropriately prior to calling this script."). Three edits, keeping is_unmapped for crash safety: - rust/src/bam_counter.rs drop is_secondary | is_supplementary | is_duplicate - rust/src/mapping_filter.rs drop !is_proper_pair | is_secondary | is_supplementary - rust/src/bam_filter.rs drop bitmask 0x100 | 0x800 | 0x200 | 0x400 Rationale: on WASP-remapped input (e.g., *_wasp_filt_rmdup.bam), the v1.2.0 filters re-filter already-cleaned reads, silently dropping valid WASP-pass alignments. The impact is small on BWA output (~0.15%) but substantial on STAR output (~37% of gene-level ref/alt rows differ) where secondary/supplementary alignments are routine. Removing the filters restores byte-level parity with the pre-v1.3.0 Python counter on RNA (0/17,728 rows differ) and within 0.10% on ATAC, with the ATAC residual tracing to a pre-existing pre-dedup ordering bug in count_alleles.py (read marked seen before the aligned-pairs check). Callers relying on the defensive filters should pre-filter BAMs upstream (e.g., samtools view -F 0x904). Refs: van de Geijn et al. 2015, Nat Methods 10.1038/nmeth.3582
test_rust_python_counting_parity reimplements the Rust counter in pure Python to check numerical parity. Update the Python reference to drop only unmapped reads, matching the new canonical WASP behavior in bam_counter.rs. Without this, the parity test would fail because the Python reference would continue to filter secondary/supplementary/ duplicate while Rust no longer does.
Remove quoted return type annotation on make_intersect_df; polars is imported at module top so the forward reference is unnecessary. This un-blocks ruff in CI.
Jaureguy760
added a commit
that referenced
this pull request
Apr 19, 2026
Bump Cargo.toml/Dockerfile/bioconda-recipe/Singularity.def from 1.4.0 to 1.4.1. Move CHANGELOG [Unreleased] to [1.4.1] with 2026-04-18 date. Release notes: - rust: restore canonical WASP filter contract (#96) - tests: align Python reference with canonical filter contract (#96) - chore: fix pre-existing ruff UP037 in intersect_variant_data.py (#96)
Jaureguy760
added a commit
that referenced
this pull request
Apr 19, 2026
* feat(security): add CodeQL analysis and improve vulnerability scanning - Add CodeQL workflow for advanced Python SAST with security-extended queries - Improve security.yml to fail builds on vulnerabilities (removed || true) - Add Gitleaks CI job for automated secret detection - Add weekly scheduled security scans - Create SECURITY.md with vulnerability disclosure policy Implements GitHub issue #27. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(ci): use self-hosted runner for gitleaks, make CodeQL optional - Modified gitleaks to run on self-hosted runner with direct CLI - Added continue-on-error to CodeQL (requires GitHub-hosted runners) - This allows CI to pass with only self-hosted infrastructure Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(ci): correct pip-audit flag and handle cargo audit lock file - Change --require-hashes=false to --no-require-hashes (correct flag syntax) - Remove stale advisory-db lock file before cargo audit - Add || true to make audit failures non-blocking (advisory only) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: add comprehensive 10X scRNA-seq barcode file format examples (#88) Add detailed documentation for 10X barcode file formats including: - Chemistry version table (v2/v3/Multiome) with whitelist sizes - PBMC and multi-sample aggregated examples - Format validation utilities (bash and Python) - Common format variations and suffix handling - Quick diagnostic commands for troubleshooting Add example barcode test files: - barcodes_10x_multi_sample.tsv (multi-sample with -1/-2/-3 suffixes) - barcodes_10x_hierarchical.tsv (hierarchical cell type naming) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Jaureguy760
added a commit
that referenced
this pull request
Apr 19, 2026
* rust: restore canonical WASP filter contract Remove 6 SAM flag filters added in 1.2.0 migration (commit ed5b007) so WASP2 counting respects pre-filtered BAMs. This realigns behavior with the canonical WASP documented contract that callers are responsible for upstream BAM filtering (bmvdgeijn/WASP CHT/bam2h5.py: "This program does not perform filtering of reads based on mappability. It is assumed that the input BAM files are filtered appropriately prior to calling this script."). Three edits, keeping is_unmapped for crash safety: - rust/src/bam_counter.rs drop is_secondary | is_supplementary | is_duplicate - rust/src/mapping_filter.rs drop !is_proper_pair | is_secondary | is_supplementary - rust/src/bam_filter.rs drop bitmask 0x100 | 0x800 | 0x200 | 0x400 Rationale: on WASP-remapped input (e.g., *_wasp_filt_rmdup.bam), the v1.2.0 filters re-filter already-cleaned reads, silently dropping valid WASP-pass alignments. The impact is small on BWA output (~0.15%) but substantial on STAR output (~37% of gene-level ref/alt rows differ) where secondary/supplementary alignments are routine. Removing the filters restores byte-level parity with the pre-v1.3.0 Python counter on RNA (0/17,728 rows differ) and within 0.10% on ATAC, with the ATAC residual tracing to a pre-existing pre-dedup ordering bug in count_alleles.py (read marked seen before the aligned-pairs check). Callers relying on the defensive filters should pre-filter BAMs upstream (e.g., samtools view -F 0x904). Refs: van de Geijn et al. 2015, Nat Methods 10.1038/nmeth.3582 * tests: align Python reference with canonical filter contract test_rust_python_counting_parity reimplements the Rust counter in pure Python to check numerical parity. Update the Python reference to drop only unmapped reads, matching the new canonical WASP behavior in bam_counter.rs. Without this, the parity test would fail because the Python reference would continue to filter secondary/supplementary/ duplicate while Rust no longer does. * docs: CHANGELOG entry for canonical WASP filter restoration * chore: fix ruff UP037 in intersect_variant_data Remove quoted return type annotation on make_intersect_df; polars is imported at module top so the forward reference is unnecessary. This un-blocks ruff in CI.
Jaureguy760
added a commit
that referenced
this pull request
Apr 19, 2026
Bump Cargo.toml/Dockerfile/bioconda-recipe/Singularity.def from 1.4.0 to 1.4.1. Move CHANGELOG [Unreleased] to [1.4.1] with 2026-04-18 date. Release notes: - rust: restore canonical WASP filter contract (#96) - tests: align Python reference with canonical filter contract (#96) - chore: fix pre-existing ruff UP037 in intersect_variant_data.py (#96)
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Restore canonical WASP filter contract in the Rust counting path. The v1.2.0 migration (commit
a72ffba) added 6 SAM flag filters that silently drop valid WASP-pass alignments when the input BAM has been pre-filtered. The impact is small on BWA output but substantial on STAR output: on RNA, v1.3.0+ disagrees with the pre-v1.3.0 Python counter at ~37% of gene-level ref/alt rows on the same BAM.Fix
Three edits in
rust/src/, keepingis_unmappedfor crash safety:bam_counter.rsis_secondary | is_supplementary | is_duplicatemapping_filter.rs!is_proper_pair | is_secondary | is_supplementarybam_filter.rs0x100 | 0x800 | 0x200 | 0x400Net: +3 insertions, -13 deletions across 3 files.
Validation
On one donor (BAM already WASP-remapped), canonical-parity Rust vs pre-v1.3.0 Python:
tests/test_rust_python_counting_parity.pyPython reference updated to match the new contract.Backwards compatibility
Users who relied on the defensive filtering should pre-filter BAMs upstream:
This matches the canonical WASP contract documented in
bmvdgeijn/WASPCHT/bam2h5.py:28-30: "This program does not perform filtering of reads based on mappability. It is assumed that the input BAM files are filtered appropriately prior to calling this script."Reference
van de Geijn et al. 2015, Nat Methods, 10.1038/nmeth.3582