error: no input files found from Assign taxonomy #1941

bylee613 · 2024-04-29T19:23:42Z

Hello,

I am gettimg error during the process of assign taxonomy. I am working on COI genes and trying to use Midori2 database.
The percentage of nochim was low in my data (~44%). I am not sure this could be the reason but I am getting error saying that " no input files found". Could you please help me with this?

tax <- assignTaxonomy(seqtab.nochim, db_fp, tryRC = TRUE, multithread=TRUE)

Error: Input/Output
no input files found
dirPath: /mnt/lz01/ecogen/shared/database/MIDORI2_UNIQ_NUC_SP_GB259_CO1_DADA2.fasta
pattern: character(0)

Thanks!

benjjneb · 2024-04-30T19:58:28Z

What is the output of:

ref_file <- "/mnt/lz01/ecogen/shared/database/MIDORI2_UNIQ_NUC_SP_GB259_CO1_DADA2.fasta"
file.exists(ref_file)

bylee613 · 2024-05-01T16:54:54Z

Thanks you. I checked the command you suggested and found out I missed a part of path. After fixed the path, I got "TURE" . However, I still have an error probably probably related to memory. But there is a note too. It there any other problem other than memory shortage? I am running this on HPC server.

ref_file <- "/mnt/lz01/ecogen/shared/database/MIDORI2_UNIQ_NUC_SP_GB259_CO1_DADA2.fasta"
file.exists(ref_file)
[1] FALSE

ref_file <- "/mnt/lz01/ecogen/shared/database/DADA2/MIDORI2_UNIQ_NUC_SP_GB259_CO1_DADA2.fasta"
file.exists(ref_file)
[1] TRUE

When I run this using slurm script, I got error below.

tax <- assignTaxonomy(seqtab.nochim, db_fp, tryRC = TRUE,
multithread=TRUE)

Error in C_assign_taxonomy2(seqs, rc(seqs), refs, ref.to.genus, tax.mat.int, :
Memory allocation failed.
Calls: assignTaxonomy -> C_assign_taxonomy2
Execution halted
srun: error: node130: task 0: Exited with exit code 1

I really appreciate your help!

benjjneb · 2024-05-01T18:34:06Z

I have not worked with the reference database you are using, so I can't give you exact memory targets, but the error indicates you are running out of memory to allocate.

When you submit your slurm job, how much memory are you requesting? And what is the amount of memory available on the compute nodes on your HPC?

bylee613 · 2024-05-03T02:27:50Z

Thanks for the reply. I haven't used this database either. It seemed that it takes way more memory that I thought. I have requested 512GB once but it wasn't enough. As long as I know, we can request 512GB (6possible nodes) and 1TB (2 possible node) on our HPC. Now competition for using those are high. I am classifying the COI amplicons for marine invertebrates . Do you know any other available COI database that can be used dada2 pipeline?

benjjneb · 2024-05-03T14:10:21Z

That seems like more than enough memory.

Have you tried testing another database that is known to work with low memory requirements, e.g. the RDP 16S database? Obviously that isn't the right database for assignment here, but it would help diagnose what's going on to see if that will run without error.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error: no input files found from Assign taxonomy #1941

error: no input files found from Assign taxonomy #1941

bylee613 commented Apr 29, 2024 •

edited

benjjneb commented Apr 30, 2024

bylee613 commented May 1, 2024

benjjneb commented May 1, 2024

bylee613 commented May 3, 2024

benjjneb commented May 3, 2024

error: no input files found from Assign taxonomy #1941

error: no input files found from Assign taxonomy #1941

Comments

bylee613 commented Apr 29, 2024 • edited

benjjneb commented Apr 30, 2024

bylee613 commented May 1, 2024

benjjneb commented May 1, 2024

bylee613 commented May 3, 2024

benjjneb commented May 3, 2024

bylee613 commented Apr 29, 2024 •

edited