This repository provides a complete workflow for analyzing mutation-informed fragmentomic features in cfDNA to improve ctDNA detection. The pipeline extracts, processes, and analyzes cfDNA fragments overlapping tumor-informed somatic mutations, and generates appropriate controls for robust statistical evaluation.
The goal of this project is to:
Identify tumor-derived cfDNA fragments using patient-specific somatic mutations.
Quantify fragmentomic features, including fragment length and end-motif diversity.
Perform per-sample statistical analyses to distinguish ctDNA-positive from ctDNA-negative plasma samples.
Generate control datasets from random genomic positions for benchmarking.
This framework allows sample-level ctDNA classification and aggregated fragmentomic analyses, supporting sensitive detection of low-tumor-burden samples without requiring training data or a panel-of-normals.
Scripts for:
Summarizing tumor VCF files
Filtering tumor mutations using quality criteria, gnomAD, BED regions, and buffy coat pileup
Generating buffy coat-based blacklists
Creating random mutation sites for statistical controls
These steps generate tumor-informed mutation sites used for cfDNA fragment extraction.
Scripts for:
Extracting fragments overlapping tumor-informed somatic mutations
Stratifying fragments by mutant vs. reference allele
Computing fragmentomic features (fragment length, motif diversity)
Performing statistical tests per sample (Wilcoxon and t-test)
Extracting fragments for random mutation controls
Summarizing fragmentomic results
These steps allow per-sample ctDNA classification and benchmarking of fragmentomic features.
For detailed step-by-step instructions, please refer to the src_tumor/ and src_cfDNA/ directories.