-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tool AMBER #5970
Add tool AMBER #5970
Conversation
tools/amber/amber.xml
Outdated
label="Genome coverages TSV file" | ||
help="Input a TSV file where the genome coverage is stated. Look at the help section to se how this file should look like!" /> | ||
</section> | ||
<section name="tox" title="taxonomic binning-specific option"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also allow using it from history, we need a test for it, I think a subsampled file as shown here could work: https://github.com/galaxyproject/tools-iuc/tree/2d380eea5900d7650c5ddfe8cf8d19d2554d0031/tools/recentrifuge/test-data/test-db
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to point to the nodes.dmp
file which is in the path given in: https://github.com/galaxyproject/tools-iuc/blob/2d380eea5900d7650c5ddfe8cf8d19d2554d0031/data_managers/data_manager_fetch_ncbi_taxonomy/tool-data/ncbi_taxonomy.loc.sample
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be done with a26dba5
tools/amber/amber.xml
Outdated
<data name="metrics_bin" format="tsv" label="${tool.name}: Bin metrics" /> | ||
</outputs> | ||
<tests> | ||
<test expect_num_outputs="4"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test need an output check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done with 80dd926
tools/amber/amber.xml
Outdated
<param name="keyword" value="circular element" /> | ||
</conditional> | ||
</section> | ||
<output name="html"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we check for something more meaningful ? Doesn't the tool produce the same output on each run? Then we could check if the files are the same ... ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that the HTML file is too big to upload here, but I used the desc option to bring something unique in the file, this was done with 0b53f9d
tools/amber/amber.xml
Outdated
label="Genome coverages TSV file" | ||
help="Input a TSV file where the genome coverage is stated. Look at the help section to se how this file should look like!" /> | ||
</section> | ||
<section name="tox" title="taxonomic binning-specific option"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to point to the nodes.dmp
file which is in the path given in: https://github.com/galaxyproject/tools-iuc/blob/2d380eea5900d7650c5ddfe8cf8d19d2554d0031/data_managers/data_manager_fetch_ncbi_taxonomy/tool-data/ncbi_taxonomy.loc.sample
tools/amber/amber.xml
Outdated
@@ -0,0 +1,336 @@ | |||
<tool id="amber" name="AMBER" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="21.05"> | |||
<description>Evaluation package for the comparative assessment of genome reconstructions and taxonomic assignments from metagenome benchmark datasets</description> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description is a bit long, can it be shortend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doen with 0e60c5f
tools/amber/amber.xml
Outdated
@@ -0,0 +1,336 @@ | |||
<tool id="amber" name="AMBER" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="21.05"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the tool ID and folder name be cami_amber?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tools/amber/amber.xml
Outdated
<param argument="--min_length" type="integer" value="" optional="true" | ||
label="Mimimum length of sequences" | ||
help="Input how long the sequnces has to be" /> | ||
<param argument="--stdout" type="boolean" truevalue="--stdout" falsevalue="" checked="false" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How large is this? Should we skip this parameter or enable it by default? Or put it into a real optional output file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done with 54c689b
tools/amber/amber.xml
Outdated
|
||
cp './output/index.html' '$html' && | ||
cp './output/heatmap_bar.png' '$path_to_html' && | ||
cp './output/results.tsv' '$result' && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of copying, moving would be better.
You can completely avoid it by using from_work_dir
on the <output>
s ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done with 2cbb875
tools/amber/amber.xml
Outdated
</inputs> | ||
<outputs> | ||
<data name="html" format="html" label="${tool.name}: HTML" /> | ||
<data name="result" format="tsv" label="${tool.name}: Results" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<data name="result" format="tsv" label="${tool.name}: Results" /> | |
<data name="result" format="tabular" label="${tool.name}: Results" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done with 9795d98
tools/amber/amber.xml
Outdated
help="Input a TSV file where the genome coverage is stated. Look at the help section to se how this file should look like!" /> | ||
</section> | ||
<section name="tox" title="taxonomic binning-specific option"> | ||
<param argument="--ncbi_dir" type="text" value="" optional="true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user does not know the path. We need to use the NCBI data table here. Please look at other IUC tools that are using this data table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be done with a26dba5
Oh the link in the commit should not be there in 80dd926 . I wanted to say that the first test is better now. |
tools/cami_amber/cami_amber.xml
Outdated
</requirements> | ||
<command detect_errors="exit_code"> | ||
<![CDATA[ | ||
mkdir -p output inputs&& |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mkdir -p output inputs&& | |
mkdir -p output inputs && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tools/cami_amber/cami_amber.xml
Outdated
<param argument="--gold_standard_file" format="tabular" type="data" | ||
label="Mapping of contigs or reads" | ||
help="Input the gold standard file here so amber know the correct IDs for each contig/read" /> | ||
<repeat name="label" title="Binning names " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<repeat name="label" title="Binning names " | |
<repeat name="label" title="Binning names" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SantaMcCloud what about putting the binning-files into the repeat. And remove the multiple="0".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do this, but it depends on your answer to my comment here: #5970 (comment)
<param argument="--desc" type="text" value="" | ||
label="HTML description" | ||
help="Enter the HTML page description here" /> | ||
<section name="genome" title="Genome binning-specific options" > |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also put this into the main repeat section? I imagine something like
repeat:
- input-file
- binning-name
- min_comp
- max_comp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can make 2 repeat sections. One with the Files and the names since the names, if you want to input some, have to match the number of files. The min_comp and max_conta can also be put together, but there are not needed for every file, since the count for all when some values are given. But for the second repeat section, it can happen that one param has more thresholds than the other, so does it make sense to combine them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- input-file
- binning-name
-> repeat
- min_comp
- max_comp
-> normal param, but a comma seperated list, so extra repeat to create that list for both params (only ones for tool)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tools/cami_amber/cami_amber.xml
Outdated
</repeat> | ||
<param argument="--remove_genomes" type="data" format="tabular" optional="true" | ||
label="TSV file for genomes to remove" | ||
help="Input a TSV file with binid and type in each line. In the help section is an example. WARNING: IF THE LIST CONTAIN ALL GENOME THE PROGRAM WILL FAIL!" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to add caps letter here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, the _add_length.xml isn't working, since I have to access the package location somehow to run the .py file there. Because of this problem, I can not add the convert.xml to have a converter for the biobox format. |
Try a combination of which and dirname to get the desired path relative to the main executable. Example here tools-iuc/tools/cojac/macros.xml Line 41 in 57fa4dc
|
@paulzierep when you have time you can play a bit with all tools. If there is something I should change, just let me know :) @bgruening thank you for the help!! :D |
ln -s '${gold_standard_file}' '$gold_standard_file.element_identifier' && | ||
ln -s '${fasta_file}' '$fasta_file.element_identifier' && | ||
|
||
python "\$(dirname "\$(dirname "\$(which amber.py)")")/lib/python3.11/site-packages/src/utils/add_length_column.py" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SantaMcCloud I see now. This trick only works if the scripts are in a neutral place, like /share/
etc ...
It will break on installations with python3.10 or python3.12.
We need to fix the conda package here I think. In the conda package we symlink the lib/python3.11/site-packages/src/utils/add_length_column.py
package to /bin
and then everyone should be happy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you also PR the recipe. I will delete my PR then :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also thank you again for the help!
Now all should be good! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Initial runs for all tools work and produce correct output. Thank you very much @SantaMcCloud
- The taxonomic binning needs a test case.
- I need to check which DM we can use for the
nodes.dmp
file. - The HTML out needs its heatmap
@@ -0,0 +1,91 @@ | |||
<tool id="cami_amber" name="CAMI AMBER add lenght column" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="21.05"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the tools need a different id each
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,193 @@ | |||
<tool id="cami_amber" name="CAMI AMBER convert to biobox" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="21.05"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the tools need a different id each
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
]]> | ||
</command> | ||
<inputs> | ||
<param argument="--gold_standard_file" type="data" format="tabular" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add, that this works for any biobox file, but amber only needs seq lengths for the gold standard file ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
**Input** | ||
|
||
This tool required 2 inputs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that is the wrong help ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
]]> | ||
</command> | ||
<inputs> | ||
<param name="work" type="select" label="How should the tool work?" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs a better description. Maybe like: Merge binning output files ? If yes all input files will be merged into one biobox output (required input by amber). If no each input will be converted into a separate biobox output file.
<- although the tool can do it, I am not sure if this is ever needed for anything -> do you have an idea @SantaMcCloud ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
</param> | ||
<conditional name="input"> | ||
<param name="is_select" type="select" label="Choose how to input the files" | ||
help="You can either input the files manually or choose to input a collection"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe rather something like you can provide each input seperately or as a collection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tools/cami_amber/cami_amber.xml
Outdated
<section name="tox" title="taxonomic binning-specific option"> | ||
<conditional name="ncbi"> | ||
<param name="is_select" type="select" label="Want to use the NCBI database?" | ||
help="Select yes if you want to use the NCBI database from a data manager"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the choice should be first: Do you want to evaluate taxonomic binning ? Then select ncbi file manually or form DM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
</assert_contents> | ||
</output> | ||
</test> | ||
<test expect_num_outputs="4"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think to test weather the taxonomic binning evaluation works we need a test dataset we do not have. Strange that they do not provide it in their test set. But I think you can create a minimal example by manipulating the example from github:
Gold standard:
@Version:0.9.1
@SampleID:gsa
@@SEQUENCEID BINID TAXID LENGTH
RH|P|C37126 Sample6_89 45202 25096
RH|P|C3274 Sample9_91 32644 10009
RH|P|C26099 1053046 765201 689201
RH|P|C35075 1053046 765201 173282
RH|P|C20873 1053046 765201 339258
Binning1:
@Version:0.9.1
@SampleID:gsa
@@SEQUENCEID BINID TAXID LENGTH
RH|P|C37126 Sample6_89 32644 25096
RH|P|C3274 Sample9_91 32644 10009
RH|P|C26099 1053046 765201 689201
RH|P|C35075 1053046 765201 173282
RH|P|C20873 1053046 765201 339258
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And can you also check weather is also works if we only supply the nodes.dmp
file ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did check it, and you need all 3 to make it work. I just forgot it to mention it somewhere sorry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To evaluate the taxonomic binning we do also need a test data set that has the TAXID
column in the gold standard and the binning file. Then it should produce a html like this one: https://cami-challenge.github.io/AMBER/cami_ii_mouse_gut/
The test here still misses that column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used this data but got errors, I found out why now. I did create test_gold.tsv
. test_binning.tsv
and test_binning2.tsv
to create the tox binning, BUT you can only create this tab in the HTML result when running this tool manually with either the download NCBI files or the DM. When using the test-db
files, we never get such a output since they are mocked or compromised to fit the 1 MB restriction.
</section> | ||
</inputs> | ||
<outputs> | ||
<data name="html" format="html" from_work_dir="output/index.html" label="${tool.name}: HTML" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The html output is missing the heatmap_bar.png
either the output file needs to be patched to get it from an online source (maybe the amber github), or you need to add seconday files to the html, its a minor thing, but looks much better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did try to make it, but I forgot a step to implement it correctly. Now it should work after I remember what I did missed when did commit the first version of it
Co-authored-by: paulzierep <paul.zierep@googlemail.com>
@paulzierep for the DM you can use this one https://github.com/galaxyproject/tools-iuc/tree/a95f5b04b1219489a327a622184633a561fe5ac0/data_managers/data_manager_fetch_ncbi_taxonomy I did try it out with this and it worked fine |
tools/cami_amber/cami_amber.xml
Outdated
<tool id="cami_amber" name="CAMI AMBER" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="21.05"> | ||
<description>Evaluation package for MAGs</description> | ||
<macros> | ||
<token name="@TOOL_VERSION@">2.0.4</token> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SantaMcCloud can you please move those macros to a separate macro file and import it into your tools. The idea behind macros is to DRY. Don't repeat yourself. requirements
can also be moved into this macro file and shared. Same for citations and other stuff that can be shared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @SantaMcCloud very nice work! |
Thank you :D |
FOR CONTRIBUTOR: