Bug description
Running PCGR v2.2.5 with --estimate_signatures on a WGS sample that
contains a large deletion and a small indel near a chromosome end aborts
with a BSgenome boundary error in the mutational signature step.
Version of PCGR
v2.2.5
Genome build
grch38
Command
pcgr \
--sample_id SAMPLE \
--input_vcf synthetic_boundary_bug.vcf.gz \
--vep_dir /path/to/vep_dir \
--refdata_dir /path/to/refdata \
--tumor_dp_tag TUMOR_AF \
--tumor_af_tag TUMOR_AF \
--genome_assembly grch38 \
--assay WGS \
--estimate_signatures \
--estimate_msi \
--estimate_tmb \
--output_dir output/
Full error message:
Error in loadFUN(x, seqname, ranges) :
trying to load regions beyond the boundaries of non-circular sequence "chr2"
Minimal reproducible VCF
synthetic_boundary_bug.vcf.gz
The VCF contains 50 SNVs + a 166 bp deletion (sets flank_dist = 166×20 = 3320 in
.get_big_dels) + a 2 bp deletion at chr2:242,190,210 whose right flank
(242,190,210 + 2 + 3,320 = 242,193,532) exceeds the chr2 length (242,193,529).
Additional context
write_processed_vcf() writes the internal mutsig VCF with no ##contig lines.
MutationalPatterns::read_vcfs_as_granges() then sets seqlengths = NA, making
trim() a no-op. When .get_big_dels() applies flank_dist to all deletions,
any deletion within flank_dist bases of a chromosome end causes getSeq() to abort.
Bug description
Running PCGR v2.2.5 with
--estimate_signatureson a WGS sample thatcontains a large deletion and a small indel near a chromosome end aborts
with a BSgenome boundary error in the mutational signature step.
Version of PCGR
v2.2.5
Genome build
grch38
Command
Full error message:
Minimal reproducible VCF
synthetic_boundary_bug.vcf.gz
The VCF contains 50 SNVs + a 166 bp deletion (sets flank_dist = 166×20 = 3320 in
.get_big_dels) + a 2 bp deletion at chr2:242,190,210 whose right flank
(242,190,210 + 2 + 3,320 = 242,193,532) exceeds the chr2 length (242,193,529).
Additional context
write_processed_vcf() writes the internal mutsig VCF with no ##contig lines.
MutationalPatterns::read_vcfs_as_granges() then sets seqlengths = NA, making
trim() a no-op. When .get_big_dels() applies flank_dist to all deletions,
any deletion within flank_dist bases of a chromosome end causes getSeq() to abort.