Skip to content

BSgenome boundary error in get_indel_context() when sample has large deletion and indel near chromosome end #295

@qclayssen

Description

@qclayssen

Bug description

Running PCGR v2.2.5 with --estimate_signatures on a WGS sample that
contains a large deletion and a small indel near a chromosome end aborts
with a BSgenome boundary error in the mutational signature step.

Version of PCGR

v2.2.5

Genome build

grch38

Command

pcgr \
  --sample_id SAMPLE \
  --input_vcf synthetic_boundary_bug.vcf.gz \
  --vep_dir /path/to/vep_dir \
  --refdata_dir /path/to/refdata \
  --tumor_dp_tag TUMOR_AF \
  --tumor_af_tag TUMOR_AF \
  --genome_assembly grch38 \
  --assay WGS \
  --estimate_signatures \
  --estimate_msi \
  --estimate_tmb \
  --output_dir output/

Full error message:

Error in loadFUN(x, seqname, ranges) :
  trying to load regions beyond the boundaries of non-circular sequence "chr2"

Minimal reproducible VCF

synthetic_boundary_bug.vcf.gz

The VCF contains 50 SNVs + a 166 bp deletion (sets flank_dist = 166×20 = 3320 in
.get_big_dels) + a 2 bp deletion at chr2:242,190,210 whose right flank
(242,190,210 + 2 + 3,320 = 242,193,532) exceeds the chr2 length (242,193,529).

Additional context

write_processed_vcf() writes the internal mutsig VCF with no ##contig lines.
MutationalPatterns::read_vcfs_as_granges() then sets seqlengths = NA, making
trim() a no-op. When .get_big_dels() applies flank_dist to all deletions,
any deletion within flank_dist bases of a chromosome end causes getSeq() to abort.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions