Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No hits from DIAMOND #170

Open
JosieMainwaring opened this issue Mar 15, 2024 · 18 comments
Open

No hits from DIAMOND #170

JosieMainwaring opened this issue Mar 15, 2024 · 18 comments
Assignees

Comments

@JosieMainwaring
Copy link

Hi all,

I have annotated the example E. coli K12 genome & my genome of interest using run_dbcan on a virtualbox linux system and I had no issues with errors in the code, and the output data files were produced as expected. However, in both cases, there are zero hits in the column for DIAMOND (all just '-' entries, and no hits with all 3 tools), which is unexpected for both genomes.

Does anyone know what might be causing this?

For reference, the diamond version I'm running is 2.0.11

Any help appreciated!

@JosieMainwaring
Copy link
Author

Hi, I'm still having this issue.
I've tried building the databases using
dbcan_build --cpus 8 --db-dir db --clean
or by the Database Installation Command, and the problem persists, even though it seems like diamond has been installed.
The diamond.out files are not populated.
Any help please?

@linnabrown
Copy link
Owner

Diamond version here is 2.1.9. I just create an new environment and install the dbcan according to our document.
It is very strange there is no hits for diamond on your end.
I tried this command to run the example E. coli genome, which only choose diamond so no result for EC number, hmmer and dbcan_sub:

run_dbcan EscheriaColiK12MG1655.faa protein --out_dir output_233 -t diamond

Following is the overview result

Gene ID EC#     HMMER   dbCAN_sub       DIAMOND #ofTools
NP_414562.1     -       -       -       GT77    1
NP_414631.1     -       -       -       GT28    1
NP_414632.1     -       -       -       GT28    1
NP_414638.1     -       -       -       CE11    1
NP_414654.1     -       -       -       GH13_3  1
NP_414672.1     -       -       -       CE4     1
NP_414691.1     -       -       -       GT51    1
NP_414724.1     -       -       -       GT19    1
NP_414726.1     -       -       -       GH13_30 1
NP_414736.1     -       -       -       CBM50+GH25      1
NP_414747.1     -       -       -       CBM50+GH23      1
NP_414805.1     -       -       -       GH43_11 1
NP_414845.1     -       -       -       AA3_2   1
NP_414869.1     -       -       -       GH1     1
NP_414877.1     -       -       -       GH36    1
NP_414878.1     -       -       -       GH2     1
NP_414879.3     -       -       -       GH2     1
NP_414897.1     -       -       -       GT2     1
NP_414936.1     -       -       -       GH13_3  1
NP_414937.2     -       -       -       CBM34+GH13_21   1
NP_415006.1     -       -       -       GH152   1
NP_415017.1     -       -       -       CBM50   1
NP_415059.1     -       -       -       GH27    1
NP_415087.1     -       -       -       GH24    1
NP_415101.1     -       -       -       GT0     1
NP_415108.1     -       -       -       GH13_3  1
NP_415118.1     -       -       -       GT2     1
NP_415167.1     -       -       -       GH103   1
NP_415168.1     -       -       -       GH103   1
NP_415175.1     -       -       -       GH13_26 1
NP_415188.1     -       -       -       GT4     1
NP_415203.1     -       -       -       CE9     1
NP_415206.1     -       -       -       CE8     1
NP_415214.1     -       -       -       CBM48+GH13_9    1
NP_415252.1     -       -       -       GT51    1
NP_415254.1     -       -       -       GT2     1
NP_415255.1     -       -       -       GT2     1
NP_415256.1     -       -       -       GT2     1
NP_415257.1     -       -       -       GT22    1
NP_415260.1     -       -       -       GH38    1
NP_415279.1     -       -       -       GH3     1
NP_415293.1     -       -       -       CE8     1
NP_415296.1     -       -       -       AA5_1   1
NP_415403.1     -       -       -       GT4     1
NP_415410.1     -       -       -       GT2     1
NP_415541.1     -       -       -       GT2     1
NP_415542.1     -       -       -       CE4+GH153       1
NP_415543.1     -       -       -       CE4+GH153       1
NP_415567.1     -       -       -       GT2     1

Can you have the same overview result like mine?

@JosieMainwaring
Copy link
Author

Thanks for your reply! I updated my Diamon version to 2.1.9 and tried the above and I'm still having the same problem! It runs as expected, and comes up with no errors, but the diamond.out files and diamond column of the overview.txt file are empty still. Any other thoughts???

@JosieMainwaring
Copy link
Author

I've just tried from scratch again, setting up a new environment and installing everything again from scratch and still having the same issue :( looking like this data will just be missing from my dissertation! (Which is due next week)

@JosieMainwaring
Copy link
Author

I need run_dbcan version too (can't use online) because it's a fungal genome

@linnabrown
Copy link
Owner

Can you provide the data you are using? That does not make sense diamond no hits

@JosieMainwaring
Copy link
Author

JosieMainwaring commented May 22, 2024

Thanks for replying. I've been using the example data to try to get it to work. Have tried both nucelotide and aa sequences, using
"run_dbcan EscheriaColiK12MG1655.fna prok --out_dir output_EscheriaColiK12MG1655"
as well as the code you provided above:
"run_dbcan EscheriaColiK12MG1655.faa protein --out_dir output_233 -t diamond"

And running my query data gave the same issue

@yinlabniu
Copy link
Collaborator

yinlabniu commented May 22, 2024 via email

@JosieMainwaring
Copy link
Author

Yes, I see the following:
"
(dbcan3) tup@Tuptop-VirtualBox:~$ diamond help
diamond v2.1.9.163 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

Syntax: diamond COMMAND [OPTIONS]

Commands:
makedb Build DIAMOND database from a FASTA file
prepdb Prepare BLAST database for use with Diamond
blastp Align amino acid query sequences against a protein reference database
blastx Align DNA query sequences against a protein reference database
cluster Cluster protein sequences
linclust Cluster protein sequences in linear time
realign Realign clustered sequences against their centroids
recluster Recompute clustering to fix errors
reassign Reassign clustered sequences to the closest centroid
view View DIAMOND alignment archive (DAA) formatted file
merge-daa Merge DAA files
help Produce help message
version Display version information
getseq Retrieve sequences from a DIAMOND database file
dbinfo Print information about a DIAMOND database file
test Run regression tests
makeidx Make database index
greedy-vertex-cover Compute greedy vertex cover

Possible [OPTIONS] for COMMAND can be seen with syntax: diamond COMMAND

Online documentation at http://www.diamondsearch.org
"
I'll try this for the example data, but for my query sequence I don't have an amino acid file unfortunately!

@JosieMainwaring
Copy link
Author

What do I input for cazy_indexfile and dia_eval ?

@linnabrown
Copy link
Owner

Can you install the docker version? This is the fastest way.

@JosieMainwaring
Copy link
Author

JosieMainwaring commented May 22, 2024

I haven't tried the docker version yet - not familiar with Docker at all. But I'll give it a try

Edit: Will it be fastest for a noob who doesn't yet have Docker installed?

@JosieMainwaring
Copy link
Author

I don't have space on my computer to pull the haidyi/run_dbcan image for Docker setup - I'll have to try through my university HPC tomorrow! Thanks for help so far guys. It's the last piece of data I need - all just to write a couple of numbers into a table! Will be back tomorrow

@yinlabniu
Copy link
Collaborator

yinlabniu commented May 22, 2024 via email

@JosieMainwaring
Copy link
Author

That makes sense for the query sequence, but why would the example E. coli data not work either? Including with the amino acid file? If I can get the example data working, then I still have hope for my query sequence. I'd just have to translate it to .faa by other means, right?

@yinlabniu
Copy link
Collaborator

yinlabniu commented May 22, 2024 via email

@JosieMainwaring
Copy link
Author

Thanks everyone for your help. I got everything (example data & query) working just by running all the same steps on my HPC. For whatever reason Diamond was just determined to be broken on my linux. So, not solved but worked around.

@linnabrown
Copy link
Owner

linnabrown commented May 22, 2024

Again, highly recommend to use docker image when you confront this issue next time. Each person might change the configuration of his/her system which might ruin the installation for other software. Since docker won't ruin your linux system and it created its own linux system already @JosieMainwaring

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants