-
Notifications
You must be signed in to change notification settings - Fork 487
New tool addition: amas tool #7443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
See updated changes from George at jchchiu#1 |
tools/amas/amas.xml
Outdated
| <param name="in_format" type="select" label="Format of the input file"> | ||
| <option value="fasta">fasta</option> | ||
| <option value="phylip">phylip</option> | ||
| <option value="phylip-int">phylip-int</option> | ||
| <option value="nexus">nexus(sequential)</option> | ||
| <option value="nexus-int">nexus(interleaved)</option> | ||
| </param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fasta phylip and nexus can be distinguishe automatically, e.g. $input_file.ext gives the Galaxy datatype. Is the info on interleaved/not needed? Can it be determined automatically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem like they have a function that detects the format for interleaved automatically; instead it depends on the input you give it. Can galaxy automatically distinguish interleaved?
tools/amas/amas.xml
Outdated
| </collection> | ||
|
|
||
| <collection name="converted_alignments" type="list" label="Converted alignments"> | ||
| <discover_datasets directory="run_dir/convert" pattern="(?P<name>.+)-out\..+" format="data" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should set the format instead if format="data"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you have a look at the amas_split.xml at L55; is this what you were thinking?
|
Hey @bernt-matthias, could you have a look at amas_concat.xml and see if this is on the right track? If so, I'll update the rest of the subcommands with your suggestions. |
This reverts commit 6fc566f.
| concat | ||
| --concat-part partitions.txt | ||
| --concat-out concatenated.out | ||
| --part-format $part_format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can determine the input format from $input_files.ext.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment also relevant to #7443 (comment)
The problem I have with this is that if it is a nexus or phylip file, their extension doesn't always tell whether it is an interleaved or sequential format. Even if you sniff it as an interleaved does $input_files.ext return phlyip-int or something like that which differentiates it from normal phylip? Otherwise I'm pretty sure amas needs the user to explicitly set the file format as an input.
What are you thoughts on only taking non-interleaved formats, and give a warning to the user that it will not accept interleaved in the help or something? Following this also removing the option to output it as an interleaved file? Problem I see with this is that they can still upload an interleaved file since they have the same extension.
| help="A file defining how the concatenated alignment is split into separate gene/locus regions. Each line specifies a partition name and its position range (e.g., 'gene1 = 1-500' or 'DNA, gene1 = 1-500' for RAxML format)."> | ||
| <option value="nexus">nexus</option> | ||
| <option value="raxml">raxml</option> | ||
| <option value="unspecified" selected="true">unspecified</option> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens in the unspecified case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just has the genes and their start and end; should I add more context or just direct them to the help section?
- Unspecified:
gene1 = 1-500 - RAxML:
DNA, gene1 = 1-500 - NEXUS:
#NEXUS
Begin sets;
charset gene1 = 1-500;
End;
| </xml> | ||
|
|
||
| <xml name="version_command"> | ||
| <version_command>python -m amas.AMAS -h</version_command> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-h gives the version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I don't think there is any command in amas that gives the version. What do you suggest as the recommended procedure if there is no such version command? Tried a bit of searching but couldn't find anything.
Could try to open a PR on the tool to add a simple command but the repo doesn't seem active.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could just echo @TOOL_VERSION@
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even better python -c "import amas; print(amas.__version__)"
…d some formatting
tools/amas/amas_concat.xml
Outdated
| <when input="out_format" value="nexus-int" format="nex" /> | ||
| </change_format> | ||
| </data> | ||
| <data name="partitions_out" from_work_dir="run_dir/action/partitions.txt" format="txt" label="Partition file" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File extension is irrelevant. part_format determines the format, isn't it. You can then:
https://docs.galaxyproject.org/en/master/dev/schema.html#tool-outputs-data-change-format-when
| <param name="in_format" value="fasta" /> | ||
| <param name="data_type" value="dna" /> | ||
| <param name="check_align" value="false" /> | ||
| <output name="output" file="outputs/expected_concat.phylip" ftype="phylip" compare="sim_size" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we comp[are the content instead of "just" compare="sim_size"
| </xml> | ||
|
|
||
| <xml name="version_command"> | ||
| <version_command>python -m amas.AMAS -h</version_command> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could just echo @TOOL_VERSION@
| </inputs> | ||
|
|
||
| <outputs> | ||
| <data name="summary_out" from_work_dir="summary.txt" format="txt" label="${tool.name} on ${on_string} (Alignment summary)" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use a format like the following for all labels
<data name="summary_out" from_work_dir="summary.txt" format="txt" label="${tool.name} on ${on_string}: Alignment summary" />
| </xml> | ||
|
|
||
| <xml name="version_command"> | ||
| <version_command>python -m amas.AMAS -h</version_command> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even better python -c "import amas; print(amas.__version__)"
FOR CONTRIBUTOR:
Regarding issue #7442
To do:
For now, is it possible to review the code to see if it's on the right track, and if there are any better ways to structure it?