Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with empty files (such as produced by metabat) #540

Closed
cmfield opened this issue Aug 2, 2023 · 1 comment
Closed

Problem with empty files (such as produced by metabat) #540

cmfield opened this issue Aug 2, 2023 · 1 comment
Labels
error Help required for a GTDB-Tk error. next version Upcoming feature/fix in staging branch.

Comments

@cmfield
Copy link

cmfield commented Aug 2, 2023

Metabat2 produces a few files during binning that can be empty, such as:
sample.lowDepth.fa
sample.tooShort.fa

GTDB-Tk thus has problems processing the folder containing the binned .fa files, because Mash produces an error for empty files. Should be an easy fix to remove size zero files from the list of files to process I hope - it would save me a fix to my pipeline anyway.

GTDB-Tk log:

[2023-08-02 09:44:22] INFO: GTDB-Tk v2.3.0
[2023-08-02 09:44:22] INFO: gtdbtk classify_wf --genome_dir scratch/takada/metabat/ -x fa --out_dir scratch/takada/annotation/ --cpus 32 --prefix takada --mash_db scratch/takada/annotation/takada.msh
[2023-08-02 09:44:22] INFO: Using GTDB-Tk reference data version r214: /nfs/nas22/fs2201/biol_micro_unix_modules/modules/software/GTDB-Tk/2.3.0-foss-2020b/data
[2023-08-02 09:44:22] INFO: Loading reference genomes.
[2023-08-02 09:44:22] INFO: Using Mash version 2.3
[2023-08-02 09:44:22] INFO: Creating Mash sketch file: scratch/takada/annotation/classify/ani_screen/intermediate_results/mash/takada.user_query_sketch.msh
[2023-08-02 09:44:22] INFO: Completed 2 genomes in 0.01 seconds (195.71 genomes/second).
[2023-08-02 09:44:22] ERROR: Error generating Mash sketch:
[2023-08-02 09:44:22] ERROR: Controlled exit resulting from an unrecoverable error or warning.

Mash log (edited) for command mash sketch -l -p 32 <(ls scratch/takada/metabat/*fa) -o scratch/takada/annotation/takada.msh -k 16 -s 5000:

<lots of files that work>
ERROR: Did not find fasta records in "input files".
@cmfield cmfield added the error Help required for a GTDB-Tk error. label Aug 2, 2023
@pchaumeil
Copy link
Collaborator

Hello,
Thanks for your feedback,
We will add a test to disregard any empty genome files. This will be available in the next Tk release

pchaumeil added a commit that referenced this issue Nov 23, 2023
This commit is to fix few bugs:
- #540 : The empty files are skip during the sketch step of Mash,
they are then catch in the prodigal step and are returned as Unclassified
- #549 : `--force` has been modified to deal with #540
- Prodigal wasn't returning the empty files as failed genomes, it was only skipping them.
These genomes are now returned in the summary file and flagged as Unclassified.
@pchaumeil pchaumeil added the next version Upcoming feature/fix in staging branch. label Nov 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
error Help required for a GTDB-Tk error. next version Upcoming feature/fix in staging branch.
Projects
None yet
Development

No branches or pull requests

2 participants