Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
917c7aa
feat: add amas and macros
jchchiu Nov 6, 2025
bb629f6
test: add simple working test case
jchchiu Nov 6, 2025
be4db1f
fix: change category to Multiple Alignments
jchchiu Nov 6, 2025
932783d
fix: change category to Sequence Analysis
jchchiu Nov 6, 2025
793a6b1
update from george
jchchiu Nov 7, 2025
af75ec9
update from george; add tests
jchchiu Nov 7, 2025
a45c8b5
update from george; add info.xml
jchchiu Nov 7, 2025
e816d9c
fix lint
jchchiu Nov 7, 2025
9967a62
add split test; update .shed; add comment to xml command
jchchiu Nov 7, 2025
8e937d7
update .shed owners
jchchiu Nov 7, 2025
a6ff62e
remove translate
jchchiu Nov 7, 2025
c354605
docs: update .shed
jchchiu Nov 11, 2025
a4fc62f
refactor: split concat into separate tool
jchchiu Nov 11, 2025
6a56045
refactor: add input and output format as shared macro
jchchiu Nov 11, 2025
426a577
refactor: add macro for changing output format
jchchiu Nov 11, 2025
c757008
refactor: move info to macros
jchchiu Nov 11, 2025
1509d85
refactor: change tool id/name; remove info macro
jchchiu Nov 11, 2025
6872743
docs: update categories; reduce actions
jchchiu Nov 11, 2025
c77e246
refactor: rename output format
jchchiu Nov 11, 2025
582d254
refactor: move 'split' subcommand into separate tool
jchchiu Nov 11, 2025
bc9bebd
refactor: change output pattern
jchchiu Nov 11, 2025
dc15ac1
refactor: move 'replicate' subcommand into separate tool
jchchiu Nov 11, 2025
a279552
docs: add more help to explain what partitions are
jchchiu Nov 11, 2025
1d901f5
refactor: move 'summary' subcommand into separate tool
jchchiu Nov 12, 2025
77241c3
temp: move 'remove' subcommand into separate tool
jchchiu Nov 12, 2025
17c02f2
fix: change version to correct token
jchchiu Nov 12, 2025
91c5fe3
refactor: remove redundant xmls
jchchiu Nov 12, 2025
62a9bce
refactor: remove/add reused/redundant macros
jchchiu Nov 12, 2025
a12ab96
docs: update help/documentation
jchchiu Nov 12, 2025
f071907
docs: update help
jchchiu Nov 12, 2025
0bb5c40
test: remove tests no longer needed
jchchiu Nov 12, 2025
653c992
refactor: change 'remove' repeat to text + regex validator
jchchiu Nov 12, 2025
178d5cc
fix: fix misplaced end param tag
jchchiu Nov 12, 2025
81cbb66
docs: updated help for 'remove'
jchchiu Nov 12, 2025
8f32f1d
docs: update help info
jchchiu Nov 12, 2025
46502d3
refactor: add profile token to macro; replace in subcommands
jchchiu Nov 12, 2025
587bb5d
refactor: change param 'name' to 'argument' for 'boolean'
jchchiu Nov 12, 2025
6fc566f
docs: rename output label so that it is more user friendly
jchchiu Nov 12, 2025
5809b0a
Revert "docs: rename output label so that it is more user friendly"
jchchiu Nov 12, 2025
e2de21b
docs: rename output label so that it is more user friendly
jchchiu Nov 12, 2025
b3c6135
docs: add auto_tool_repositories and suite to shed.yml
jchchiu Nov 17, 2025
4b43895
refactor: run everything in ./; added ftype to tests
jchchiu Nov 17, 2025
f6a85d5
refactor: changed check_align and data_type to macro
jchchiu Nov 17, 2025
08bd74d
refactor: moved shared commands to macro tokens
jchchiu Nov 17, 2025
b19d9b7
refactor/docs: moved shared help to macro token
jchchiu Nov 17, 2025
b61f0ff
refactor: added ${tool.name} on ${on_string} to output labels
jchchiu Nov 17, 2025
846b254
docs: updated file format formatting to be more consistent
jchchiu Nov 17, 2025
eec0620
style: removed single quotes
jchchiu Nov 17, 2025
8364cfe
docs: updated docs to include info on sequential vs interleaved; fixe…
jchchiu Nov 17, 2025
834f114
docs: moved partitions help to macro token
jchchiu Nov 17, 2025
2d2349b
refactor: set format depending on part_format
jchchiu Nov 19, 2025
0e62561
style: changed formatting of output files
jchchiu Nov 19, 2025
4af9562
fix: updated version command
jchchiu Nov 19, 2025
cfcfca9
tests: changed concat test from sim size to exact
jchchiu Nov 19, 2025
d4b84ac
refactor: simplified change_format
jchchiu Nov 19, 2025
51bb36e
fix: updated/fixed concat test
jchchiu Nov 19, 2025
ff762fb
fix: added nex format to allowed inputs for partitions
jchchiu Nov 19, 2025
3d9424b
docs: updated help
jchchiu Nov 19, 2025
18a8396
style: fix lint
jchchiu Nov 19, 2025
bd9a818
fix: split subcommand does not work with RAxML or NEXUS formatted par…
jchchiu Nov 19, 2025
0aae4cb
docs: added some comments for future
jchchiu Nov 20, 2025
96395ca
style: cleaned up indenting
jchchiu Nov 20, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions tools/amas/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
categories:
- Phylogenetics
- Sequence Analysis
- Statistics
description: AMAS high-throughput alignment manipulation and summaries for phylogenomics
homepage_url: https://github.com/marekborowiec/AMAS
long_description: Handle expansive phylogenomic data sets by concatenating, removing,
replicating, splitting, and summarising large nucleotide or amino acid alignments.
name: amas
owner: iuc
remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/main/tools/amas
auto_tool_repositories:
name_template: "{{ tool_id }}"
description_template: "Wrapper for amas functions: {{ tool_name }}."
suite:
name: "suite_amas"
description: "A suite of tools that brings the amas project into Galaxy."
long_description: Handle expansive phylogenomic data sets by concatenating, removing,
replicating, splitting, and summarising large nucleotide or amino acid alignments.
123 changes: 123 additions & 0 deletions tools/amas/amas_concat.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
<tool id="amas_concat" name="AMAS concat" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@">
<description>concatenate multiple alignments</description>

<macros>
<import>macros.xml</import>
</macros>

<xrefs>
<xref type="bio.tools">amas</xref>
</xrefs>

<expand macro="requirements" />
<expand macro="version_command" />

<command detect_errors="exit_code"><![CDATA[
#import re
set -eu;

@SYMLINK_INPUTS@

python -m amas.AMAS
concat
--concat-part partitions.txt
--concat-out concatenated.out
--part-format $part_format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can determine the input format from $input_files.ext.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment also relevant to #7443 (comment)

The problem I have with this is that if it is a nexus or phylip file, their extension doesn't always tell whether it is an interleaved or sequential format. Even if you sniff it as an interleaved does $input_files.ext return phlyip-int or something like that which differentiates it from normal phylip? Otherwise I'm pretty sure amas needs the user to explicitly set the file format as an input.

What are you thoughts on only taking non-interleaved formats, and give a warning to the user that it will not accept interleaved in the help or something? Following this also removing the option to output it as an interleaved file? Problem I see with this is that they can still upload an interleaved file since they have the same extension.

--out-format $out_format
--in-files
@INPUT_FILENAMES@
--in-format $in_format
--data-type $data_type
--cores "\${GALAXY_SLOTS:-1}"
$check_align
]]></command>

<inputs>
<param name="input_files" type="data" format="fasta,phylip,nex" label="Sequences to concatenate" multiple="true"
help="Provide pre-aligned FASTA/PHYLIP/NEXUS files (DNA or protein); mixes of unaligned reads or contigs will produce meaningless results." />
<expand macro="input_format" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that with a little effort interleaved/sequential could be added to the datatypes metadata. Currently Galaxy can only sniff both as phylip : https://github.com/galaxyproject/galaxy/blob/368bb1c62ff286735975c8cd116d9fac20249bab/lib/galaxy/datatypes/phylip.py#L52

For nexus I guess its complicated, since it can conatain much more then sequence data alone.

<expand macro="output_format" label="Select output format for concatenated alignment" />
<param name="part_format" type="select" label="Format of the partitions file"
help="A file defining how the concatenated alignment is split into separate gene/locus regions. Each line specifies a partition name and its position range (e.g., 'gene1 = 1-500' or 'DNA, gene1 = 1-500' for RAxML format).">
<option value="unspecified" selected="true">unspecified</option>
<option value="nexus">nexus</option>
<option value="raxml">raxml</option>
</param>
<expand macro="data_type" />
<expand macro="check_align" />
</inputs>

<outputs>
<data name="output" from_work_dir="concatenated.out" format="txt" label="${tool.name} on ${on_string}: Concatenated alignment">
<change_format>
<when input="out_format" value="fasta" format="fasta" />
<when input="out_format" value="phylip" format="phylip" />
<when input="out_format" value="phylip-int" format="phylip" />
<when input="out_format" value="nexus" format="nex" />
<when input="out_format" value="nexus-int" format="nex" />
</change_format>
</data>
<data name="partitions_out" from_work_dir="partitions.txt" format="txt" label="${tool.name} on ${on_string}: Partition file">
<change_format>
<!-- Untitled and RAxML partition formats have no current equivalent datatypes so are outputted as txt by default -->
<when input="part_format" value="nexus" format="nex" />
</change_format>
</data>
</outputs>

<tests>
<test expect_num_outputs="2">
<param name="input_files" value="inputs/concat_1.fasta,inputs/concat_2.fasta" />
<param name="out_format" value="phylip" />
<param name="part_format" value="nexus" />
<param name="in_format" value="fasta" />
<param name="data_type" value="dna" />
<param name="check_align" value="false" />
<output name="output" file="outputs/expected_concat.phylip" ftype="phylip" />
<output name="partitions_out" file="outputs/expected_partitions.nex" ftype="nex" />
</test>
<test expect_num_outputs="2">
<param name="input_files" value="inputs/concat_1.fasta,inputs/concat_2.fasta" />
<param name="out_format" value="fasta" />
<param name="part_format" value="raxml" />
<param name="in_format" value="fasta" />
<param name="data_type" value="dna" />
<param name="check_align" value="false" />
<output name="output" file="outputs/expected_concat_fasta.fas" ftype="fasta" />
<output name="partitions_out" file="outputs/expected_partitions_raxml.txt" ftype="txt" />
</test>
</tests>

<help><![CDATA[
**What it does**

AMAS Concat combines multiple sequence alignments into a single concatenated alignment, commonly used in phylogenomic analyses.

**Inputs**

- **Multiple alignment files**: Select 2 or more pre-aligned sequence files (FASTA, PHYLIP, or NEXUS format)
- **Input format**: Specify the format of your input files
- **Partition format**: Specify how you want the partition file to be formatted (Unspecified, RAxML, NEXUS)
- **Data type**: Choose DNA for nucleotide sequences or Protein for amino acid sequences
- **Output format**: Select the desired format for the concatenated alignment

**Outputs**

1. **Concatenated alignment**: A single file containing all input alignments joined end-to-end
2. **Partitions file**: Defines the boundaries of each original alignment within the concatenated file

@PARTITIONS_HELP@

**Use cases**

- **Multi-locus phylogenomics**: Combine hundreds of genes for species tree inference
- **Partitioned phylogenetic analysis**: Apply different evolutionary models to different genes using tools like RAxML or IQ-TREE
- **Supermatrix construction**: Create dataset for concatenation-based phylogenetic methods
- **Increased phylogenetic signal**: Leverage information from multiple loci to resolve difficult nodes
- **Comparative analyses**: Prepare datasets for testing hypotheses across multiple genomic regions

@AMAS_SHARED_HELP@
]]></help>

<expand macro="citations" />
</tool>
100 changes: 100 additions & 0 deletions tools/amas/amas_remove.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
<tool id="amas_remove" name="AMAS remove" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@">
<description>remove taxa from multiple alignments</description>

<macros>
<import>macros.xml</import>
</macros>

<xrefs>
<xref type="bio.tools">amas</xref>
</xrefs>

<expand macro="requirements" />
<expand macro="version_command" />

<command detect_errors="exit_code"><![CDATA[
#import re
set -eu;

@SYMLINK_INPUTS@

python -m amas.AMAS
remove
--taxa-to-remove
#for $taxon in $taxa_to_remove.split()
'$taxon'
#end for
--out-format $out_format
--in-files
@INPUT_FILENAMES@
--in-format $in_format
--data-type $data_type
--cores "\${GALAXY_SLOTS:-1}"
$check_align
]]></command>

<inputs>
<param name="input_files" type="data" format="fasta,phylip,nex" label="Sequence(s) to remove taxa" multiple="true"
help="Provide pre-aligned FASTA/PHYLIP/NEXUS files (DNA or protein); mixes of unaligned reads or contigs will produce meaningless results." />
<expand macro="input_format" />
<expand macro="output_format" label="Select output format for alignment(s) with taxa removed"/>
<param name="taxa_to_remove" type="text" label="Taxa to remove"
help="Space-separated list of taxon names to remove (e.g., 'OTU9 OTU10 Sample_A'). Note: AMAS converts spaces to underscores and strips quotes from sequence names, so use 'Species_1' to remove a taxon named 'Species 1'.">
<validator type="regex" message="Please provide at least one taxon name (alphanumeric, underscores, hyphens, and dots allowed)">[A-Za-z0-9_.\-]+(\s+[A-Za-z0-9_.\-]+)*</validator>
</param>
<expand macro="data_type" />
<expand macro="check_align" />
</inputs>

<outputs>
<expand macro="collection_outputs" name="reduced_alignments" />
</outputs>

<tests>
<test expect_num_outputs="1">
<param name="input_files" value="inputs/remove_input.nex" />
<param name="taxa_to_remove" value="OTU9 OTU10" />
<param name="out_format" value="nexus-int" />
<param name="in_format" value="nexus" />
<param name="data_type" value="dna" />
<param name="check_align" value="false" />
<output_collection name="reduced_alignments_nexus" type="list">
<element name="reduced_remove_input.nex-out.int-nex" file="outputs/expected_remove_filtered.int-nex" ftype="nex" />
</output_collection>
</test>
</tests>

<help><![CDATA[
**What it does**

AMAS Remove excludes specified taxa (sequences) from one or more alignments. This is useful for removing problematic sequences, outgroups, or creating taxon subsets for comparative analyses.

**Inputs**

- **Alignment files**: One or more pre-aligned sequence files (FASTA, PHYLIP, or NEXUS format)
- **Taxa to remove**: Space-separated list of sequence names to exclude (e.g., 'OTU9 OTU10 Sample_A')

**Important**: AMAS converts spaces to underscores and strips quotes from sequence names during processing. If your input file contains a taxon named 'Species 1' or '"Species 1"', you must specify it as 'Species_1' in the taxa to remove list.

- **Input format**: Specify the format of your input files
- **Data type**: Choose DNA for nucleotide sequences or Protein for amino acid sequences
- **Output format**: Select the desired format for the reduced alignments

**Outputs**

A collection of alignment files with specified taxa removed. Each output file contains the same alignment as the input, minus the excluded sequences.

**Tip:** You may want to realign your files after taxon removal.

**Use cases**

- Remove sequences with excessive missing data
- Exclude contaminated or mis-identified samples
- Create taxon subsets for sensitivity analyses
- Remove outgroups after tree rooting

@AMAS_SHARED_HELP@
]]></help>

<expand macro="citations" />
</tool>
96 changes: 96 additions & 0 deletions tools/amas/amas_replicate.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
<tool id="amas_replicate" name="AMAS replicate" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@">
<description>replicate multiple alignments</description>

<macros>
<import>macros.xml</import>
</macros>

<xrefs>
<xref type="bio.tools">amas</xref>
</xrefs>

<expand macro="requirements" />
<expand macro="version_command" />

<command detect_errors="exit_code"><![CDATA[
#import re
set -eu;

@SYMLINK_INPUTS@

python -m amas.AMAS
replicate
--rep-aln $replicate_replicates $replicate_loci
--out-format $out_format
--in-files
@INPUT_FILENAMES@
--in-format $in_format
--data-type $data_type
--cores "\${GALAXY_SLOTS:-1}"
$check_align
]]></command>

<inputs>
<param name="input_files" type="data" format="fasta,phylip,nex" label="Sequence(s) to replicate" multiple="true"
help="Provide pre-aligned FASTA/PHYLIP/NEXUS files (DNA or protein); mixes of unaligned reads or contigs will produce meaningless results." />
<expand macro="input_format" />
<expand macro="output_format" label="Select output format for replicated alignment(s)" />
<param name="replicate_replicates" type="integer" value="10" min="1" label="Number of replicate datasets to build" />
<param name="replicate_loci" type="integer" value="2" min="1" label="Number of loci per replicate" />
<expand macro="data_type" />
<expand macro="check_align" />
</inputs>

<outputs>
<expand macro="collection_outputs" name="replicate_alignments" />
</outputs>

<tests>
<test expect_num_outputs="1">
<param name="input_files" value="inputs/fasta1.fas" />
<param name="replicate_replicates" value="2" />
<param name="replicate_loci" value="1" />
<param name="out_format" value="nexus" />
<param name="in_format" value="fasta" />
<param name="data_type" value="dna" />
<param name="check_align" value="false" />
<output_collection name="replicate_alignments_nexus" type="list">
<element name="replicate1_1-loci-out.nex" file="outputs/expected_replicate1.nex" ftype="nex" />
<element name="replicate2_1-loci-out.nex" file="outputs/expected_replicate2.nex" ftype="nex" />
</output_collection>
</test>
</tests>

<help><![CDATA[
**What it does**

AMAS Replicate generates jackknife or bootstrap replicates by randomly sampling loci (genes) from your dataset. This is used to assess phylogenetic signal distribution and node support across different genomic regions.

**Inputs**

- **Alignment files**: Multiple pre-aligned sequence files, one per locus/gene (FASTA, PHYLIP, or NEXUS format)
- **Number of replicates**: How many replicate datasets to generate
- **Loci per replicate**: How many loci to include in each replicate
- **Input format**: Specify the format of your input files
- **Data type**: Choose DNA for nucleotide sequences or Protein for amino acid sequences
- **Output format**: Select the desired format for the replicate alignments

**Outputs**

A collection of replicate alignment files. Each replicate contains a random subset of the input loci concatenated together.

**Use cases**

- **Phylogenetic jackknifing**: Assess whether phylogenetic signal is driven by specific loci
- **Node support evaluation**: Test robustness of tree topology across different gene combinations
- **Signal heterogeneity**: Identify whether conflicting signals come from particular genomic regions

**Example**

From 100 input genes, create 10 replicates each containing 50 randomly sampled genes. Each replicate can then be used to build a phylogenetic tree, and consistency across replicates indicates robust phylogenetic signal.

@AMAS_SHARED_HELP@
]]></help>

<expand macro="citations" />
</tool>
Loading