Skip to content

Conversation

@jchchiu
Copy link

@jchchiu jchchiu commented Nov 6, 2025

FOR CONTRIBUTOR:

  • I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
  • License permits unrestricted use (educational + commercial)
  • This PR adds a new tool or tool collection
  • This PR updates an existing tool or tool collection
  • This PR does something else (explain below)

Regarding issue #7442

  • Implemented the amas alignment concatenation action needed for the Biohackathon workflow
  • Added a simple test case with corresponding expected outputs
The AMAS commands are:
  concat      Concatenate input alignments
  convert     Convert to other file format
  replicate   Create replicate data sets for phylogenetic jackknife
  split       Split alignment according to a partitions file
  summary     Write alignment summary
  remove      Remove taxa from alignment
  translate   Translate DNA alignment into protein alignment
  • Command flags can be seen here
  • Note, The Tool Standard Output for concat will say "Wrote concatenated sequences to fasta file 'concatenated.out'"; however the file obtained will be renamed with the output format you have chosen (e.g. 'concatenated.fasta' for fasta, 'concatenated.nex' for nexus and nexus-int)

To do:

  • Add more test coverage
  • Will also try to implement replicate and split (need to workout how to do outputs); others are already covered by existing Galaxy tools (unless there is something you think looks interesting to implement)
  • Clean up code and help text
  • Fix --check-align

For now, is it possible to review the code to see if it's on the right track, and if there are any better ways to structure it?

@jchchiu
Copy link
Author

jchchiu commented Nov 7, 2025

See updated changes from George at jchchiu#1

Comment on lines 139 to 145
<param name="in_format" type="select" label="Format of the input file">
<option value="fasta">fasta</option>
<option value="phylip">phylip</option>
<option value="phylip-int">phylip-int</option>
<option value="nexus">nexus(sequential)</option>
<option value="nexus-int">nexus(interleaved)</option>
</param>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fasta phylip and nexus can be distinguishe automatically, e.g. $input_file.ext gives the Galaxy datatype. Is the info on interleaved/not needed? Can it be determined automatically?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/marekborowiec/AMAS/blob/2e93d31638625135aa48a68251c363ac23a47c4a/amas/AMAS.py#L692

It doesn't seem like they have a function that detects the format for interleaved automatically; instead it depends on the input you give it. Can galaxy automatically distinguish interleaved?

</collection>

<collection name="converted_alignments" type="list" label="Converted alignments">
<discover_datasets directory="run_dir/convert" pattern="(?P&lt;name&gt;.+)-out\..+" format="data" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should set the format instead if format="data"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you have a look at the amas_split.xml at L55; is this what you were thinking?

@jchchiu
Copy link
Author

jchchiu commented Nov 11, 2025

Hey @bernt-matthias, could you have a look at amas_concat.xml and see if this is on the right track? If so, I'll update the rest of the subcommands with your suggestions.
Cheers for all the thorough comments.

@SaimMomin12 SaimMomin12 changed the title Add amas(1.0) tool New tool addition: amas tool Nov 11, 2025
concat
--concat-part partitions.txt
--concat-out concatenated.out
--part-format $part_format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can determine the input format from $input_files.ext.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment also relevant to #7443 (comment)

The problem I have with this is that if it is a nexus or phylip file, their extension doesn't always tell whether it is an interleaved or sequential format. Even if you sniff it as an interleaved does $input_files.ext return phlyip-int or something like that which differentiates it from normal phylip? Otherwise I'm pretty sure amas needs the user to explicitly set the file format as an input.

What are you thoughts on only taking non-interleaved formats, and give a warning to the user that it will not accept interleaved in the help or something? Following this also removing the option to output it as an interleaved file? Problem I see with this is that they can still upload an interleaved file since they have the same extension.

help="A file defining how the concatenated alignment is split into separate gene/locus regions. Each line specifies a partition name and its position range (e.g., 'gene1 = 1-500' or 'DNA, gene1 = 1-500' for RAxML format).">
<option value="nexus">nexus</option>
<option value="raxml">raxml</option>
<option value="unspecified" selected="true">unspecified</option>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens in the unspecified case?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just has the genes and their start and end; should I add more context or just direct them to the help section?

  • Unspecified: gene1 = 1-500
  • RAxML: DNA, gene1 = 1-500
  • NEXUS:
#NEXUS

Begin sets;
    charset gene1 = 1-500;
End;

</xml>

<xml name="version_command">
<version_command>python -m amas.AMAS -h</version_command>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-h gives the version?

Copy link
Author

@jchchiu jchchiu Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I don't think there is any command in amas that gives the version. What do you suggest as the recommended procedure if there is no such version command? Tried a bit of searching but couldn't find anything.

Could try to open a PR on the tool to add a simple command but the repo doesn't seem active.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could just echo @TOOL_VERSION@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even better python -c "import amas; print(amas.__version__)"

<when input="out_format" value="nexus-int" format="nex" />
</change_format>
</data>
<data name="partitions_out" from_work_dir="run_dir/action/partitions.txt" format="txt" label="Partition file" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File extension is irrelevant. part_format determines the format, isn't it. You can then:
https://docs.galaxyproject.org/en/master/dev/schema.html#tool-outputs-data-change-format-when

<param name="in_format" value="fasta" />
<param name="data_type" value="dna" />
<param name="check_align" value="false" />
<output name="output" file="outputs/expected_concat.phylip" ftype="phylip" compare="sim_size" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we comp[are the content instead of "just" compare="sim_size"

</xml>

<xml name="version_command">
<version_command>python -m amas.AMAS -h</version_command>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could just echo @TOOL_VERSION@

</inputs>

<outputs>
<data name="summary_out" from_work_dir="summary.txt" format="txt" label="${tool.name} on ${on_string} (Alignment summary)" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use a format like the following for all labels

        <data name="summary_out" from_work_dir="summary.txt" format="txt" label="${tool.name} on ${on_string}: Alignment summary" />

</xml>

<xml name="version_command">
<version_command>python -m amas.AMAS -h</version_command>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even better python -c "import amas; print(amas.__version__)"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants