Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error: no input files found from Assign taxonomy #1941

Open
bylee613 opened this issue Apr 29, 2024 · 5 comments
Open

error: no input files found from Assign taxonomy #1941

bylee613 opened this issue Apr 29, 2024 · 5 comments

Comments

@bylee613
Copy link

bylee613 commented Apr 29, 2024

Hello,

I am gettimg error during the process of assign taxonomy. I am working on COI genes and trying to use Midori2 database.
The percentage of nochim was low in my data (~44%). I am not sure this could be the reason but I am getting error saying that " no input files found". Could you please help me with this?

tax <- assignTaxonomy(seqtab.nochim, db_fp, tryRC = TRUE, multithread=TRUE)

Error: Input/Output
no input files found
dirPath: /mnt/lz01/ecogen/shared/database/MIDORI2_UNIQ_NUC_SP_GB259_CO1_DADA2.fasta
pattern: character(0)

Thanks!

@benjjneb
Copy link
Owner

What is the output of:

ref_file <- "/mnt/lz01/ecogen/shared/database/MIDORI2_UNIQ_NUC_SP_GB259_CO1_DADA2.fasta"
file.exists(ref_file)

@bylee613
Copy link
Author

bylee613 commented May 1, 2024

Thanks you. I checked the command you suggested and found out I missed a part of path. After fixed the path, I got "TURE" . However, I still have an error probably probably related to memory. But there is a note too. It there any other problem other than memory shortage? I am running this on HPC server.

ref_file <- "/mnt/lz01/ecogen/shared/database/MIDORI2_UNIQ_NUC_SP_GB259_CO1_DADA2.fasta"
file.exists(ref_file)
[1] FALSE

ref_file <- "/mnt/lz01/ecogen/shared/database/DADA2/MIDORI2_UNIQ_NUC_SP_GB259_CO1_DADA2.fasta"
file.exists(ref_file)
[1] TRUE

When I run this using slurm script, I got error below.

tax <- assignTaxonomy(seqtab.nochim, db_fp, tryRC = TRUE,
multithread=TRUE)

Error in C_assign_taxonomy2(seqs, rc(seqs), refs, ref.to.genus, tax.mat.int, :
Memory allocation failed.
Calls: assignTaxonomy -> C_assign_taxonomy2
Execution halted
srun: error: node130: task 0: Exited with exit code 1

I really appreciate your help!

@benjjneb
Copy link
Owner

benjjneb commented May 1, 2024

I have not worked with the reference database you are using, so I can't give you exact memory targets, but the error indicates you are running out of memory to allocate.

When you submit your slurm job, how much memory are you requesting? And what is the amount of memory available on the compute nodes on your HPC?

@bylee613
Copy link
Author

bylee613 commented May 3, 2024

Thanks for the reply. I haven't used this database either. It seemed that it takes way more memory that I thought. I have requested 512GB once but it wasn't enough. As long as I know, we can request 512GB (6possible nodes) and 1TB (2 possible node) on our HPC. Now competition for using those are high. I am classifying the COI amplicons for marine invertebrates . Do you know any other available COI database that can be used dada2 pipeline?

@benjjneb
Copy link
Owner

benjjneb commented May 3, 2024

That seems like more than enough memory.

Have you tried testing another database that is known to work with low memory requirements, e.g. the RDP 16S database? Obviously that isn't the right database for assignment here, but it would help diagnose what's going on to see if that will run without error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants