Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There is something wrong with your reference file. Valid file types are .fasta, .gbk, .fasta.gz, .gbk.gz. Please check your inputs and try again. #70

Open
Vikash84 opened this issue Jun 21, 2023 · 2 comments

Comments

@Vikash84
Copy link

Vikash84 commented Jun 21, 2023

(bohra) [vsingh@vdl]$ bohra run -p full -i isolates.tab -r ./NC_012925.1.fasta.gz

[INFO:06/21/2023 11:52:12 AM] Bohra is being run in /home/vsingh/vdl/Ssuis/ by vsingh on 2023-06-21.
[INFO:06/21/2023 11:52:12 AM] You are running bohra in full mode.
[INFO:06/21/2023 11:52:12 AM] Job ID is set Bohra microbial genomics pipeline
[INFO:06/21/2023 11:52:12 AM] Tyring to find your profile.
[INFO:06/21/2023 11:52:12 AM] You are running bohra with the lcl profile.
[INFO:06/21/2023 11:52:12 AM] You are running with conda - wise decision!! Will now ensure that kraken DB is configured properly.
[INFO:06/21/2023 11:52:12 AM] Searching for kraken2 DB: $KRAKEN2_DEFAULT_DB
[INFO:06/21/2023 11:52:12 AM] You are using the default kraken2 database at : /home/vsingh/softwares/minikrake2_db/minikraken_8GB_20200312/
[INFO:06/21/2023 11:52:12 AM] Checking that /home/vsingh/softwares/minikrake2_db/minikraken_8GB_20200312 is a directory, checking that files are not empty
[INFO:06/21/2023 11:52:12 AM] Found /home/vsingh/softwares/minikrake2_db/minikraken_8GB_20200312, checking that files are not empty
[INFO:06/21/2023 11:52:12 AM] Congratulations your kraken database is present and all files are present.
[INFO:06/21/2023 11:52:12 AM] Now looking for MLST setup
[INFO:06/21/2023 11:52:12 AM] Checking mlst setup.
[WARNING:06/21/2023 11:52:12 AM] You do not have mlst databases pre-configured the default DB with your installation of mlst will be used.
[INFO:06/21/2023 11:52:12 AM] Found isolates.tab.
[INFO:06/21/2023 11:52:12 AM] File isolates.tab is in correct format.
[INFO:06/21/2023 11:52:12 AM] No valid contigs file has been supplied. Assemblies will be generated.
[INFO:06/21/2023 11:52:12 AM] Found NC_012925.1.fasta.gz.
[INFO:06/21/2023 11:52:12 AM] Reference ./NC_012925.1.fasta.gz has been found. Will now copy to running directory.
[INFO:06/21/2023 11:52:12 AM] The file : NC_012925.1.fasta.gz already exists in the current directory
[INFO:06/21/2023 11:52:12 AM] Checking if reference is a valid reference file.
[CRITICAL:06/21/2023 11:52:12 AM] There is something wrong with your reference file. Valid file types are .fasta, .gbk, .fasta.gz, .gbk.gz. Please check your inputs and try again.

@tetedange13
Copy link

Hi @Vikash84 ,

First, please note I am not an author of bohra (only a new user)

Have you tried re-downloading + re-GZIP your reference FASTA ?
=> Because I tried myself getting your NC_012925.1 (from GenBank entry >"Send to FASTA" then GZIPed it)
=> And bohra run well returned "Reference is in a valid format."

Otherwise, the code responsible for your error is the following :

bohra/bohra/SnpDetection.py

Lines 232 to 239 in 5ff3a46

LOGGER.info(f"Checking if reference is a valid reference file.")
p = subprocess.run(f"any2fasta {ref}", shell = True, capture_output = True, encoding = "utf-8")
if p.returncode == 0:
LOGGER.info(f"Reference is in a valid format.")
else:
LOGGER.critical(f"There is something wrong with your reference file. Valid file types are .fasta, .gbk, .fasta.gz, .gbk.gz. Please check your inputs and try again.")
raise SystemExit

So you have this error due to any2fastq {ref} command returning an error (= returned value different from "0")
=> Maybe try to use any2fasta yourself on your reference file, to see why it ends with an error

Hope this helps !
Have a nice day,
Felix.

@kristyhoran
Copy link
Collaborator

@Vikash84 @tetedange13 has found the spot where the error is impacting you. The any2fasta is designed to check that the file is in a format that will be acceptable to the dependencies of the pipeline.. so if this is failing it indicates that there may be an issue with the file. Please let me know if you have any other issues

Cheers
Kristy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants