Visualizes tRNA alignment pileup data as a mismatch-rate heatmap with a shared Sprinzl coordinate x-axis.
- Python >= 3.9
- Python packages:
numpy,pysam,pandas,matplotlib - External tool: Infernal (
cmalign) — required for therunandinspectsubcommands
-
Clone the repository:
git clone https://github.com/genometechlab/tRNA-heatmap/ cd tRNA-heatmap -
Install the Python package and its dependencies:
pip install -e . -
Install Infernal (required for
runandinspect):conda install -c bioconda infernal # or, on macOS with Homebrew: brew install infernal
After installation, the tRNAheatmap command will be available in your terminal.
The run subcommand is the full pipeline. All input BAMs are organized into named --condition groups: pass one group to get a single heatmap, two or more to get pairwise delta heatmaps in one step.
Single heatmap (one condition)
# Using a bundled organism model (recommended)
tRNAheatmap run \
--ref ref.fa \
--condition sample aligned.bam \
--organism eukaryotic \
--output heatmap.png \
--save-df pileup \
--threads 4
# Using a custom covariance model
tRNAheatmap run \
--ref ref.fa \
--condition sample aligned.bam \
--cm /path/to/my-model.cm \
--output heatmap.png \
--save-df pileup \
--threads 4Pairwise delta heatmaps (≥2 conditions)
# Two single-BAM conditions
tRNAheatmap run \
--ref ref.fa \
--condition condA condA.bam \
--condition condB condB.bam \
--organism eukaryotic \
--output results/delta \
--threads 4
# Replicate-aware: average per-BAM mismatch rates within each condition
tRNAheatmap run \
--ref ref.fa \
--organism eukaryotic --merge-mode equal \
--condition WT wt1.bam wt2.bam wt3.bam \
--condition KO ko1.bam ko2.bam ko3.bam \
--output results/delta \
--individual \
--threads 4With ≥2 --condition groups, --output is treated as a prefix: each pair is written as {prefix}_{A}_vs_{B}.{ext} (default extension .pdf). Adding --individual also emits one heatmap per condition ({prefix}_{condition}.{ext}).
| Flag | Description |
|---|---|
--ref / -r |
Reference tRNA FASTA used for alignment |
--condition / -C |
Condition group: first token is the name, remaining tokens are BAM paths. Repeat for multiple conditions. At least one is required. |
--merge-mode |
total (default): sum raw counts within a condition. equal: average per-BAM mismatch rates so each replicate weighs equally regardless of depth. |
--organism / -g |
Use a bundled model: eukaryotic, prokaryotic, or archaeal (mutually exclusive with --cm) |
--cm / -c |
Path to a custom Infernal covariance model .cm file (mutually exclusive with --organism) |
--output / -o |
One condition: output heatmap file (extension determines format). Multiple conditions: prefix; pairs become {prefix}_{A}_vs_{B}.{ext}. |
--save-df / -s |
Base name to save pileup data as a .tsv file. Only valid with exactly one --condition. |
--individual |
With ≥2 --condition groups, also emit a per-condition heatmap ({prefix}_{condition}.{ext}) alongside the pairwise deltas. |
--threads / -t |
Number of CPU threads (default: 1) |
--save-mapping BASE |
Save the computed Sprinzl coordinate mapping to BASE.tsv for inspection or hand-editing |
--sprinzl-map FILE |
Load a pre-computed or hand-edited mapping TSV; patches cmalign output if --organism/--cm is also given, or replaces it entirely if used alone |
inspect — visualize Sprinzl coordinate coverage (no BAM needed)
tRNAheatmap inspect \
--ref ref.fa \
--organism eukaryotic \
--output sprinzl_coverage.pngRuns cmalign on the reference FASTA and renders a presence/absence heatmap: solid color = tRNA has a base at that Sprinzl position; black dot = no base (cmalign gap). No reads or pileup required. Use this to audit how Infernal handles a new reference set before committing to full runs.
| Flag | Description |
|---|---|
--ref / -r |
Reference tRNA FASTA file |
--organism / -g |
Bundled model: eukaryotic, prokaryotic, or archaeal (mutually exclusive with --cm) |
--cm / -c |
Path to a custom .cm file (mutually exclusive with --organism) |
--output / -o |
Output file; format set by extension. Default: sprinzl_coverage.pdf |
--save-mapping BASE |
Save the computed Sprinzl coordinate mapping to BASE.tsv for inspection or hand-editing |
--sprinzl-map FILE |
Load a pre-computed or hand-edited mapping TSV; patches cmalign output if --organism/--cm is also given, or replaces it entirely if used alone |
Adapter / trimming flags (run, inspect)
Use these to remove constant adapter sequences from the reference FASTA before Sprinzl coordinate assignment. The pileup arrays are trimmed to match.
| Flag | Description |
|---|---|
--detect-adapters |
Auto-detect and trim common 5′/3′ prefix/suffix shared by all tRNAs. Requires ≥2 sequences. Mutually exclusive with --trim-5. |
--trim-5 N |
Trim exactly N bases from the 5′ end of each reference. Mutually exclusive with --detect-adapters. |
--trim-3 N |
Trim exactly N bases from the 3′ end of each reference. Can be combined with --trim-5. |
Reference filtering flags (run, inspect)
| Flag | Description |
|---|---|
--include-refs FILE |
Plain-text file (one reference name per line). Only the listed references appear in the output. Mutually exclusive with --exclude-refs. |
--exclude-refs FILE |
Plain-text file (one reference name per line). Listed references are dropped. Mutually exclusive with --include-refs. |
Shared plot options (all subcommands)
| Flag | Default | Description |
|---|---|---|
--palette |
light-high |
Color scheme: light-high (yellow=low), dark-high, or viridis |
--title |
(subcommand-specific) | Figure title |
--ylabel |
tRNA Reference Names |
Y-axis label |
--include-insertions |
off | Show insertion columns (e.g. 36i1) in the plot |
--dpi |
300 | Output resolution in DPI |
--cell-size |
0.25 | Inches per heatmap cell for auto figure sizing |
--style FILE |
— | Matplotlib style sheet (.mplstyle) layered on top of the bundled default |
--outdir / -O |
cwd | Directory for all output files (created if missing) |
The test_data/ directory contains a yeast reference FASTA and 5%-subsampled BAM files for testing.
Run the full pipeline on one BAM and save the pileup as a TSV for inspection.
tRNAheatmap run \
--ref test_data/sacCer3-mature-tRNAs_zap_ref.fa \
--condition WT test_data/yeast_wt_example_subsample_rep1.bam \
--organism eukaryotic \
--output results/wt_heatmap.png \
--save-df results/wt \
--threads 4Audit how Infernal maps the reference sequences before running a full pileup.
tRNAheatmap inspect \
--ref test_data/sacCer3-mature-tRNAs_zap_ref.fa \
--organism eukaryotic \
--output results/sprinzl_coverage.pngTo detect and trim common adapter sequences automatically:
tRNAheatmap inspect \
--ref test_data/sacCer3-mature-tRNAs_zap_ref.fa \
--organism eukaryotic \
--detect-adapters \
--output results/sprinzl_coverage_trimmed.pngtRNAheatmap run \
--ref test_data/sacCer3-mature-tRNAs_zap_ref.fa \
--organism eukaryotic --merge-mode equal \
--condition WT \
test_data/yeast_wt_example_subsample_rep1.bam \
test_data/yeast_wt_example_subsample_rep2.bam \
test_data/yeast_wt_example_subsample_rep3.bam \
--condition MT \
test_data/yeast_mt_enrichment_example_subsample_rep1.bam \
test_data/yeast_mt_enrichment_example_subsample_rep2.bam \
test_data/yeast_mt_enrichment_example_subsample_rep3.bam \
--output results/delta \
--individual \
--threads 4This produces results/delta_WT_vs_MT.pdf (the pairwise delta) and, with --individual, results/delta_WT.pdf and results/delta_MT.pdf (the per-condition merged heatmaps).
Infernal's cmalign assigns Sprinzl positions automatically, but unusual tRNA variants — extra loops, truncated stems, non-canonical anticodon regions — can receive incorrect assignments. When that happens, a domain expert can export the mapping, fix the affected rows, and feed the corrected file back in without rerunning the full pipeline.
Add --save-mapping to any run or inspect call. The flag writes BASE.tsv alongside the heatmap output.
tRNAheatmap run \
--ref ref.fa \
--condition sample aligned.bam \
--organism eukaryotic \
--output heatmap.png \
--save-mapping my_coordsThis produces my_coords.tsv. If you only want to inspect the mapping without running a pileup, use inspect instead — it is faster because it skips BAM reading entirely:
tRNAheatmap inspect \
--ref ref.fa \
--organism eukaryotic \
--output sprinzl_coverage.png \
--save-mapping my_coordsThe file looks like this:
ref_name ref_position sprinzl_label modification_symbol
tRNA-Ala-AGC-1 1 1
tRNA-Ala-AGC-1 2 2
...
tRNA-Ala-AGC-1 34 34
tRNA-Ala-AGC-1 35 35
...
The columns are tab-separated: sequence name, reference position (1-based), Sprinzl label, and an optional modification symbol. To correct a mis-assigned position, change only the sprinzl_label column. For example, if position 34 of tRNA-Ala-AGC-1 was assigned label 34 but should be 33:
tRNA-Ala-AGC-1 34 33
Leave all other rows unchanged. The modification_symbol column is described below.
Patch mode — use the edited TSV alongside --organism or --cm. Only the sequences listed in the TSV are overwritten; all other references are computed normally by cmalign.
tRNAheatmap run \
--ref ref.fa \
--condition sample aligned.bam \
--organism eukaryotic \
--sprinzl-map my_coords.tsv \
--output heatmap_corrected.pngFull replacement mode — omit --organism and --cm. cmalign is skipped entirely. The TSV must cover every reference sequence in the FASTA.
tRNAheatmap run \
--ref ref.fa \
--condition sample aligned.bam \
--sprinzl-map my_coords.tsv \
--output heatmap_corrected.pngThe mapping TSV supports an optional fourth column, modification_symbol, for marking known RNA modifications. When present, the symbol (any Unicode character — e.g. ψ for pseudouridine, I for inosine, m for m⁶A) is drawn centered on the corresponding heatmap cell. The text color is chosen automatically for contrast against the cell's background color.
To annotate modifications, add the symbol in the modification_symbol column for the relevant rows and re-run with --sprinzl-map:
ref_name ref_position sprinzl_label modification_symbol
tRNA-Ala-AGC-1 34 34 ψ
tRNA-Ala-AGC-1 37 37 I
Rows with no modification can leave the fourth column empty or omit it entirely. The modification symbol is rendered on top of the mismatch-rate color; cells with no coverage (NaN) or a cmalign gap (black dot) are skipped.
You need three inputs for the run subcommand: a reference FASTA containing your tRNA sequences, one or more sorted BAM files of reads aligned to that FASTA (produced by any short- or long-read aligner) grouped into named --condition arguments, and a covariance model. Use --organism eukaryotic/prokaryotic/archaeal to select a bundled model automatically, or --cm /path/to/model.cm to supply your own. The bundled models are installed with the package — no path resolution needed.
Important: The sequence names in your BAM files and the names used by
cmalignmust match — both must come from the same reference FASTA. If you see missing data in the heatmap, verify name consistency with:samtools view -H aligned.bam | grep '^@SQ'
| File | Description |
|---|---|
heatmap.png (or .pdf/.svg) |
Heatmap with rows = tRNA references, columns = Sprinzl positions. Color = mismatch rate (0–1). Blank cell = position exists but had zero reads. Black dot (●) = cmalign gap (no base at that Sprinzl position). |
{prefix}_{A}_vs_{B}.{ext} |
Pairwise delta heatmap (multi-condition runs). Diverging Spectral_r colormap, range −1…+1. Black dot = either condition lacks the position. |
{prefix}_{condition}.{ext} |
Per-condition merged heatmap (multi-condition runs with --individual). |
pileup.tsv (from --save-df in total mode) |
Long-format table with per-position match, mismatch, insertion, deletion, accuracy, and mismatch_rate columns. |
rates.tsv (from --save-df in equal mode) |
Long-format table with mismatch_rate and std_dev columns (std is NaN for single-BAM conditions). |
coords.tsv (from --save-mapping) |
Sprinzl coordinate mapping exported by Infernal. Tab-separated columns: ref_name, ref_position (1-based), sprinzl_label, and optional modification_symbol. Opens directly in Excel or LibreOffice and can be hand-edited before reuse with --sprinzl-map. |
The .cm covariance model files and Sprinzl position mapping code are adapted from tRNAviz (LGPL-3.0) by the UCSC Lowe Lab (Lin BY, Chan PP, Lowe TM. 2019. tRNAviz. Nucleic Acids Res.).
This tool was developed with assistance from Claude Code (Anthropic).