Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,15 @@ The default is to infer ambiguous bases, so there should not be N bases in the i
> --tree $TESTDIR/../data/tree.nwk \
> --alignment $TESTDIR/../data/aligned.fasta \
> --seed 314159 \
> --output-node-data "$CRAMTMP/$TESTFILE/ancestral_mutations.json" \
> --output-sequences "$CRAMTMP/$TESTFILE/ancestral_sequences.fasta" &> /dev/null
> --output-node-data "ancestral_mutations.json" \
> --output-sequences "ancestral_sequences.fasta" &> /dev/null

$ grep "^N" "$CRAMTMP/$TESTFILE/ancestral_sequences.fasta"
$ grep "^N" "ancestral_sequences.fasta"
[1]

Check that the reference length was correctly exported as the nuc annotation

$ grep -A 6 'annotations' "$CRAMTMP/$TESTFILE/ancestral_mutations.json"
$ grep -A 6 'annotations' "ancestral_mutations.json"
"annotations": {
"nuc": {
"end": 10769,
Expand Down
62 changes: 31 additions & 31 deletions tests/functional/ancestral/cram/infer-ambiguous-nucleotides.t
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,10 @@ There should not be N bases in the inferred output sequences.
> --alignment $TESTDIR/../data/aligned.fasta \
> --infer-ambiguous \
> --seed 314159 \
> --output-node-data "$CRAMTMP/$TESTFILE/ancestral_mutations.json" \
> --output-sequences "$CRAMTMP/$TESTFILE/ancestral_sequences.fasta" &> /dev/null
> --output-node-data "ancestral_mutations.json" \
> --output-sequences "ancestral_sequences.fasta" &> /dev/null

$ grep "^N" "$CRAMTMP/$TESTFILE/ancestral_sequences.fasta"
$ grep "^N" "ancestral_sequences.fasta"
[1]

test ambiguous bases remain in seqs and muts if we use `--keep-ambiguous`
Expand All @@ -32,16 +32,16 @@ and also remain in the resulting sequence.
$ ${AUGUR} ancestral --seed 0 \
> --keep-ambiguous \
> --tree $TESTDIR/../data/simple-genome/tree.nwk --alignment aln_pos3Y.fasta \
> --output-node-data "$CRAMTMP/$TESTFILE/ancestral_mutations_1.json" \
> --output-sequences "$CRAMTMP/$TESTFILE/ancestral_sequences_1.fasta" &> /dev/null
> --output-node-data "ancestral_mutations_1.json" \
> --output-sequences "ancestral_sequences_1.fasta" &> /dev/null

$ cat "$CRAMTMP/$TESTFILE/ancestral_mutations_1.json" | jq -c '.nodes.sample_C.muts'
$ cat "ancestral_mutations_1.json" | jq -c '.nodes.sample_C.muts'
["C3Y"]

$ cat "$CRAMTMP/$TESTFILE/ancestral_mutations_1.json" | jq -c '.nodes.sample_C.sequence'
$ cat "ancestral_mutations_1.json" | jq -c '.nodes.sample_C.sequence'
"AAYAA"

$ grep 'sample_C' -A 1 "$CRAMTMP/$TESTFILE/ancestral_sequences_1.fasta"
$ grep 'sample_C' -A 1 "ancestral_sequences_1.fasta"
>sample_C
AAYAA

Expand All @@ -51,16 +51,16 @@ Same test as above but using `--infer-ambiguous` (the default) infers Y as eithe
$ ${AUGUR} ancestral --seed 0 \
> --infer-ambiguous \
> --tree $TESTDIR/../data/simple-genome/tree.nwk --alignment aln_pos3Y.fasta \
> --output-node-data "$CRAMTMP/$TESTFILE/ancestral_mutations_2.json" \
> --output-sequences "$CRAMTMP/$TESTFILE/ancestral_sequences_2.fasta" &> /dev/null
> --output-node-data "ancestral_mutations_2.json" \
> --output-sequences "ancestral_sequences_2.fasta" &> /dev/null

$ cat "$CRAMTMP/$TESTFILE/ancestral_mutations_2.json" | jq -c '.nodes.sample_C.muts'
$ cat "ancestral_mutations_2.json" | jq -c '.nodes.sample_C.muts'
[]

$ cat "$CRAMTMP/$TESTFILE/ancestral_mutations_2.json" | jq -c '.nodes.sample_C.sequence'
$ cat "ancestral_mutations_2.json" | jq -c '.nodes.sample_C.sequence'
"AACAA"

$ grep 'sample_C' -A 1 "$CRAMTMP/$TESTFILE/ancestral_sequences_2.fasta"
$ grep 'sample_C' -A 1 "ancestral_sequences_2.fasta"
>sample_C
AACAA

Expand All @@ -83,17 +83,17 @@ and reported in both sequences and mutations. Note that the parent state
$ ${AUGUR} ancestral --seed 0 \
> --keep-ambiguous \
> --tree $TESTDIR/../data/simple-genome/tree.nwk --alignment aln_pos3X.fasta \
> --output-node-data "$CRAMTMP/$TESTFILE/ancestral_mutations_3.json" \
> --output-sequences "$CRAMTMP/$TESTFILE/ancestral_sequences_3.fasta" &> /dev/null
> --output-node-data "ancestral_mutations_3.json" \
> --output-sequences "ancestral_sequences_3.fasta" &> /dev/null


$ cat "$CRAMTMP/$TESTFILE/ancestral_mutations_3.json" | jq -c '.nodes.sample_C.muts'
$ cat "ancestral_mutations_3.json" | jq -c '.nodes.sample_C.muts'
["G3N"]

$ cat "$CRAMTMP/$TESTFILE/ancestral_mutations_3.json" | jq -c '.nodes.sample_C.sequence'
$ cat "ancestral_mutations_3.json" | jq -c '.nodes.sample_C.sequence'
"AANAA"

$ grep 'sample_C' -A 1 "$CRAMTMP/$TESTFILE/ancestral_sequences_3.fasta"
$ grep 'sample_C' -A 1 "ancestral_sequences_3.fasta"
>sample_C
AANAA

Expand All @@ -104,17 +104,17 @@ Note 2: TreeTime chooses the root state as G as well, so there's no mutation on
$ ${AUGUR} ancestral --seed 0 \
> --infer-ambiguous \
> --tree $TESTDIR/../data/simple-genome/tree.nwk --alignment aln_pos3X.fasta \
> --output-node-data "$CRAMTMP/$TESTFILE/ancestral_mutations_4.json" \
> --output-sequences "$CRAMTMP/$TESTFILE/ancestral_sequences_4.fasta" &> /dev/null
> --output-node-data "ancestral_mutations_4.json" \
> --output-sequences "ancestral_sequences_4.fasta" &> /dev/null


$ cat "$CRAMTMP/$TESTFILE/ancestral_mutations_4.json" | jq -c '.nodes.sample_C.muts'
$ cat "ancestral_mutations_4.json" | jq -c '.nodes.sample_C.muts'
[]

$ cat "$CRAMTMP/$TESTFILE/ancestral_mutations_4.json" | jq -c '.nodes.sample_C.sequence'
$ cat "ancestral_mutations_4.json" | jq -c '.nodes.sample_C.sequence'
"AAGAA"

$ grep 'sample_C' -A 1 "$CRAMTMP/$TESTFILE/ancestral_sequences_4.fasta"
$ grep 'sample_C' -A 1 "ancestral_sequences_4.fasta"
>sample_C
AAGAA

Expand All @@ -135,11 +135,11 @@ should report 'N' (the standard ambiguous nucleotide) when we're not inferring a
$ ${AUGUR} ancestral --seed 0 \
> --keep-ambiguous \
> --tree $TESTDIR/../data/simple-genome/tree.nwk --alignment aln_pos3X_internal.fasta \
> --output-node-data "$CRAMTMP/$TESTFILE/ancestral_mutations_5.json" \
> --output-sequences "$CRAMTMP/$TESTFILE/ancestral_sequences_5.fasta" &> /dev/null
> --output-node-data "ancestral_mutations_5.json" \
> --output-sequences "ancestral_sequences_5.fasta" &> /dev/null


$ cat "$CRAMTMP/$TESTFILE/ancestral_mutations_5.json" | jq -c '.nodes.node_AB.sequence'
$ cat "ancestral_mutations_5.json" | jq -c '.nodes.node_AB.sequence'
"AANAA"


Expand All @@ -166,15 +166,15 @@ The reference 'X' at pos 3 should be equivalent to 'N')
> --keep-ambiguous \
> --root-sequence ref_pos3X.fasta \
> --tree $TESTDIR/../data/simple-genome/tree.nwk --alignment aln_col3N.fasta \
> --output-node-data "$CRAMTMP/$TESTFILE/ancestral_mutations_6.json" \
> --output-sequences "$CRAMTMP/$TESTFILE/ancestral_sequences_6.fasta" &> /dev/null
> --output-node-data "ancestral_mutations_6.json" \
> --output-sequences "ancestral_sequences_6.fasta" &> /dev/null


$ cat "$CRAMTMP/$TESTFILE/ancestral_mutations_6.json" | jq -c '[.nodes[].sequence] | unique'
$ cat "ancestral_mutations_6.json" | jq -c '[.nodes[].sequence] | unique'
["AANAA"]

$ cat "$CRAMTMP/$TESTFILE/ancestral_mutations_6.json" | jq -c '[.nodes[].muts[]]'
$ cat "ancestral_mutations_6.json" | jq -c '[.nodes[].muts[]]'
[]

$ cat "$CRAMTMP/$TESTFILE/ancestral_mutations_6.json" | jq -c '.reference.nuc'
$ cat "ancestral_mutations_6.json" | jq -c '.reference.nuc'
"AANAA"
Original file line number Diff line number Diff line change
Expand Up @@ -12,28 +12,28 @@ Infer multiple genes with a provided GenBank annotation file
> --genes ENV PRO \
> --translations $TESTDIR/../data/aa_sequences_%GENE.fasta \
> --seed 314159 \
> --output-node-data "$CRAMTMP/$TESTFILE/ancestral_mutations_1.json" \
> --output-translations "$CRAMTMP/$TESTFILE/ancestral_aa_sequences_%GENE_1.fasta" &> /dev/null
> --output-node-data "ancestral_mutations_1.json" \
> --output-translations "ancestral_aa_sequences_%GENE_1.fasta" &> /dev/null

Check that the annotations block only includes ENV & PRO, not nuc

$ grep -E "\"(ENV|PRO|nuc)\": {" "$CRAMTMP/$TESTFILE/ancestral_mutations_1.json"
$ grep -E "\"(ENV|PRO|nuc)\": {" "ancestral_mutations_1.json"
"ENV": {
"PRO": {

Check that amino acid sequences exist for the root node of the tree.

$ grep -A 2 "aa_sequences" "$CRAMTMP/$TESTFILE/ancestral_mutations_1.json"
$ grep -A 2 "aa_sequences" "ancestral_mutations_1.json"
"aa_sequences": {
"ENV": .* (re)
"PRO": .* (re)

Check that internal nodes have ancestral amino acid sequences.

$ grep "NODE" "$CRAMTMP/$TESTFILE/ancestral_aa_sequences_ENV_1.fasta" | wc -l
$ grep "NODE" "ancestral_aa_sequences_ENV_1.fasta" | wc -l
\s*8 (re)

$ grep "NODE" "$CRAMTMP/$TESTFILE/ancestral_aa_sequences_PRO_1.fasta" | wc -l
$ grep "NODE" "ancestral_aa_sequences_PRO_1.fasta" | wc -l
\s*8 (re)


Expand All @@ -46,7 +46,7 @@ Catches this bug <https://github.com/nextstrain/augur/pull/1958#discussion_r3034
> --genes $TESTDIR/../data/genes.txt \
> --translations $TESTDIR/../data/aa_sequences_ENV.fasta \
> --seed 314159 \
> --output-node-data "$CRAMTMP/$TESTFILE/ancestral_mutations_error.json" \
> --output-translations "$CRAMTMP/$TESTFILE/ancestral_aa_sequences_error.fasta" > /dev/null
> --output-node-data "ancestral_mutations_error.json" \
> --output-translations "ancestral_aa_sequences_error.fasta" > /dev/null
ERROR: --translations must contain %GENE for multiple-gene amino acid reconstructions
[2]
Original file line number Diff line number Diff line change
Expand Up @@ -12,24 +12,24 @@ Firstly a single gene (ENV), using a hardcoded fasta path, and an annotation fil
> --genes ENV \
> --translations $TESTDIR/../data/aa_sequences_ENV.fasta \
> --seed 314159 \
> --output-node-data "$CRAMTMP/$TESTFILE/ancestral_mutations_1.json" \
> --output-translations "$CRAMTMP/$TESTFILE/ancestral_aa_sequences_ENV_1.fasta" &> /dev/null
> --output-node-data "ancestral_mutations_1.json" \
> --output-translations "ancestral_aa_sequences_ENV_1.fasta" &> /dev/null

Check that the annotations block only includes ENV, not nuc or PRO

$ grep -E "\"(ENV|PRO|nuc)\": {" "$CRAMTMP/$TESTFILE/ancestral_mutations_1.json"
$ grep -E "\"(ENV|PRO|nuc)\": {" "ancestral_mutations_1.json"
"ENV": {

Check that amino acid sequences exist for the root node of the tree.

$ grep -A 2 "aa_sequences" "$CRAMTMP/$TESTFILE/ancestral_mutations_1.json"
$ grep -A 2 "aa_sequences" "ancestral_mutations_1.json"
"aa_sequences": {
"ENV": .* (re)
}

Check that internal nodes have ancestral amino acid sequences.

$ grep "NODE" "$CRAMTMP/$TESTFILE/ancestral_aa_sequences_ENV_1.fasta" | wc -l
$ grep "NODE" "ancestral_aa_sequences_ENV_1.fasta" | wc -l
\s*8 (re)


Expand All @@ -41,14 +41,14 @@ And the exact same, but using a %GENE pattern in the filepath
> --genes ENV \
> --translations $TESTDIR/../data/aa_sequences_%GENE.fasta \
> --seed 314159 \
> --output-node-data "$CRAMTMP/$TESTFILE/ancestral_mutations_2.json" \
> --output-translations "$CRAMTMP/$TESTFILE/ancestral_aa_sequences_%GENE_2.fasta" &> /dev/null
> --output-node-data "ancestral_mutations_2.json" \
> --output-translations "ancestral_aa_sequences_%GENE_2.fasta" &> /dev/null

Check that the outputs are identical

$ diff "$CRAMTMP/$TESTFILE/ancestral_mutations_1.json" "$CRAMTMP/$TESTFILE/ancestral_mutations_2.json"
$ diff "ancestral_mutations_1.json" "ancestral_mutations_2.json"

$ diff "$CRAMTMP/$TESTFILE/ancestral_aa_sequences_ENV_1.fasta" "$CRAMTMP/$TESTFILE/ancestral_aa_sequences_ENV_2.fasta"
$ diff "ancestral_aa_sequences_ENV_1.fasta" "ancestral_aa_sequences_ENV_2.fasta"


For single genes the annotations file is optional. The result should be the same except the (nuc) coordinates of the CDS will differ without the annotation file
Expand All @@ -58,16 +58,16 @@ For single genes the annotations file is optional. The result should be the same
> --genes ENV \
> --translations $TESTDIR/../data/aa_sequences_ENV.fasta \
> --seed 314159 \
> --output-node-data "$CRAMTMP/$TESTFILE/ancestral_mutations_3.json" \
> --output-translations "$CRAMTMP/$TESTFILE/ancestral_aa_sequences_ENV_3.fasta" &> /dev/null
> --output-node-data "ancestral_mutations_3.json" \
> --output-translations "ancestral_aa_sequences_ENV_3.fasta" &> /dev/null


$ diff "$CRAMTMP/$TESTFILE/ancestral_aa_sequences_ENV_1.fasta" "$CRAMTMP/$TESTFILE/ancestral_aa_sequences_ENV_3.fasta"
$ diff "ancestral_aa_sequences_ENV_1.fasta" "ancestral_aa_sequences_ENV_3.fasta"

Nucleotide coordinates of the ENV gene are different - without the annotation file we start them at nuc_pos=1 (1-based, GFF-style)
So for this example the offset is 960 (start is 961 - 960 = 1, end is 2472 - 960 = 1512)
So for this example the offset is 960 (start is 961 - 960 = 1, end is 2472 - 960 = 1512)

$ diff "$CRAMTMP/$TESTFILE/ancestral_mutations_1.json" "$CRAMTMP/$TESTFILE/ancestral_mutations_3.json"
$ diff "ancestral_mutations_1.json" "ancestral_mutations_3.json"
4,6c4,5
\< "end": 2472, (re)
\< "seqid": .* (re)
Expand All @@ -88,18 +88,18 @@ Test single gene reconstruction with an explicitly provided AA root-sequence
> --translations $TESTDIR/../data/aa_sequences_ENV.fasta \
> --aa-root-sequence $TESTDIR/../data/ENV_outgroup.fasta \
> --seed 314159 \
> --output-node-data "$CRAMTMP/$TESTFILE/ancestral_mutations_4.json" &> /dev/null
> --output-node-data "ancestral_mutations_4.json" &> /dev/null

The reference has been modified to include a leading AAA:

$ grep -A 2 "reference" "$CRAMTMP/$TESTFILE/ancestral_mutations_4.json"
$ grep -A 2 "reference" "ancestral_mutations_4.json"
"reference": {
"ENV": "AAA.+ (re)
}

Check that this results in 3 mutations between the provided root-sequence & the inferred root node

$ grep -A 11 "NODE_0000006" "$CRAMTMP/$TESTFILE/ancestral_mutations_4.json"
$ grep -A 11 "NODE_0000006" "ancestral_mutations_4.json"
"NODE_0000006": {
"aa_muts": {
"ENV": [
Expand All @@ -123,10 +123,10 @@ Retest the above, using a %GENE placeholder in --aa-root-sequence
> --translations $TESTDIR/../data/aa_sequences_ENV.fasta \
> --aa-root-sequence $TESTDIR/../data/%GENE_outgroup.fasta \
> --seed 314159 \
> --output-node-data "$CRAMTMP/$TESTFILE/ancestral_mutations_5.json" &> /dev/null
> --output-node-data "ancestral_mutations_5.json" &> /dev/null


$ diff "$CRAMTMP/$TESTFILE/ancestral_mutations_4.json" "$CRAMTMP/$TESTFILE/ancestral_mutations_5.json"
$ diff "ancestral_mutations_4.json" "ancestral_mutations_5.json"


Test that accidentally providing the wrong AA root-sequence (e.g. a nuc one) results in an error
Expand All @@ -138,6 +138,6 @@ Test that accidentally providing the wrong AA root-sequence (e.g. a nuc one) res
> --translations $TESTDIR/../data/aa_sequences_ENV.fasta \
> --aa-root-sequence $TESTDIR/../data/simple-genome/reference.fasta \
> --seed 314159 \
> --output-node-data "$CRAMTMP/$TESTFILE/ancestral_mutations_5.json" > /dev/null
> --output-node-data "ancestral_mutations_5.json" > /dev/null
ERROR: The provided root-sequence AA fasta for ENV has length 50 which doesn't match the length of the CDS 504 (amino acids)
[2]
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,12 @@ ancestor).
> --genes ENV PRO \
> --translations $TESTDIR/../data/aa_sequences_%GENE.fasta \
> --seed 314159 \
> --output-node-data "$CRAMTMP/$TESTFILE/ancestral_mutations.json" &> /dev/null
> --output-node-data "ancestral_mutations.json" &> /dev/null

Check that the reference length was correctly exported as the nuc annotation

$ python3 "$TESTDIR/../../../../scripts/diff_jsons.py" \
> --exclude-regex-paths "['seqid']" -- \
> "$TESTDIR/../data/ancestral_mutations_with_root_sequence.json" \
> "$CRAMTMP/$TESTFILE/ancestral_mutations.json"
> "ancestral_mutations.json"
{}
12 changes: 6 additions & 6 deletions tests/functional/ancestral/cram/infer-amino-acid-sequences.t
Original file line number Diff line number Diff line change
Expand Up @@ -11,25 +11,25 @@ Infer ancestral nucleotide and amino acid sequences.
> --genes ENV PRO \
> --translations $TESTDIR/../data/aa_sequences_%GENE.fasta \
> --seed 314159 \
> --output-node-data "$CRAMTMP/$TESTFILE/ancestral_mutations.json" \
> --output-sequences "$CRAMTMP/$TESTFILE/ancestral_sequences.fasta" \
> --output-translations "$CRAMTMP/$TESTFILE/ancestral_aa_sequences_%GENE.fasta" &> /dev/null
> --output-node-data "ancestral_mutations.json" \
> --output-sequences "ancestral_sequences.fasta" \
> --output-translations "ancestral_aa_sequences_%GENE.fasta" &> /dev/null

Check that the reference length was correctly exported as the nuc annotation

$ grep -E "\"(ENV|PRO|nuc)\": {" "$CRAMTMP/$TESTFILE/ancestral_mutations.json"
$ grep -E "\"(ENV|PRO|nuc)\": {" "ancestral_mutations.json"
"ENV": {
"PRO": {
"nuc": {

Check that amino acid sequences exist for the root node of the tree.

$ grep -A 2 "aa_sequences" "$CRAMTMP/$TESTFILE/ancestral_mutations.json"
$ grep -A 2 "aa_sequences" "ancestral_mutations.json"
"aa_sequences": {
"ENV": .* (re)
"PRO": .* (re)

Check that internal nodes have ancestral amino acid sequences.

$ grep "NODE" "$CRAMTMP/$TESTFILE/ancestral_aa_sequences_ENV.fasta" | wc -l
$ grep "NODE" "ancestral_aa_sequences_ENV.fasta" | wc -l
\s*8 (re)
6 changes: 3 additions & 3 deletions tests/functional/ancestral/cram/keep-ambiguous-nucleotides.t
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ There should be N bases in the inferred output sequences.
> --alignment $TESTDIR/../data/aligned.fasta \
> --keep-ambiguous \
> --seed 314159 \
> --output-node-data "$CRAMTMP/$TESTFILE/ancestral_mutations.json" \
> --output-sequences "$CRAMTMP/$TESTFILE/ancestral_sequences.fasta" &> /dev/null
> --output-node-data "ancestral_mutations.json" \
> --output-sequences "ancestral_sequences.fasta" &> /dev/null

$ grep "^N" "$CRAMTMP/$TESTFILE/ancestral_sequences.fasta" | head -n 1
$ grep "^N" "ancestral_sequences.fasta" | head -n 1
NNNNNNNNNNNNGACAGTTCGAGTTTGAAGCGAAAGCTAGCAACAGTATCAACAGGTTTT
5 changes: 1 addition & 4 deletions tests/functional/curate/cram/titlecase.t
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
Setup

$ pushd "$TESTDIR" > /dev/null
$ export AUGUR="${AUGUR:-../../../../bin/augur}"

$ source "$TESTDIR"/_setup.sh

Test output with articles and a mixture of lower and uppercase letters.

Expand Down Expand Up @@ -96,4 +94,3 @@ Test silencing on failures such as when encountering a non-string value
> | ${AUGUR} curate titlecase --titlecase-fields "bare_int" \
> --failure-reporting "silent"
{"bare_int": 2021}

Loading