Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BiopythonParserWarning #72

Open
deearahman opened this issue Nov 27, 2020 · 6 comments
Open

BiopythonParserWarning #72

deearahman opened this issue Nov 27, 2020 · 6 comments

Comments

@deearahman
Copy link

Hi,

I'm having issues with running the pipeline. The first warning message is BiopythonParserWarning. Then next is "No wat to run job"... Upon checking the bam and vcf folders, there were no files generated.

Similarly in the temp folders, the individual folders (callRepSNPs, deriveRepAlleleMartix, deriveRepStats, getVCFStats, q30VarFilter) were generated but no files.
[common@t7920 RedDog_v0.4.8]$ rubra RedDog --config k1locus_config_massive.py --style run > run.txt
/usr/lib64/python2.7/site-packages/Bio/GenBank/Scanner.py:1147: BiopythonParserWarning: Premature end of file in sequence data
BiopythonParserWarning)
/usr/lib64/python2.7/site-packages/Bio/GenBank/init.py:1306: BiopythonParserWarning: Expected sequence length 5248520, found 1855828 (AP006725.1).
BiopythonParserWarning)
Traceback (most recent call last):
File "/usr/bin/rubra", line 11, in
load_entry_point('Rubra==0.1.5', 'console_scripts', 'rubra')()
File "build/bdist.linux-x86_64/egg/rubra/rubra.py", line 66, in main
File "build/bdist.linux-x86_64/egg/ruffus/task.py", line 2671, in pipeline_run
File "build/bdist.linux-x86_64/egg/ruffus/task.py", line 2406, in fill_queue_with_job_parameters
File "build/bdist.linux-x86_64/egg/ruffus/task.py", line 2345, in parameter_generator
ruffus.ruffus_exceptions.RethrownJobError:

Exceptions generating parameters for

'def RedDog.checkBam(...):'

Original exception:

Exception #1
ruffus.ruffus_exceptions.MissingInputFileError(    
    
    
    No way to run job: Input file ['/data3/Analysis/K1_Locus_KP/SNP/RedDog_Output/temp/ERR025468/ERR025468.bam'] does not exist):
for RedDog.checkBam.

Traceback (most recent call last):
  File "build/bdist.linux-x86_64/egg/ruffus/task.py", line 2279, in parameter_generator
    check_input_files_exist (*param)
  File "build/bdist.linux-x86_64/egg/ruffus/file_name_parameters.py", line 191, in check_input_files_exist
    "Input file ['%s'] does not exist" % f)
MissingInputFileError:     
    
    
    No way to run job: Input file ['/data3/Analysis/K1_Locus_KP/SNP/RedDog_Output/temp/ERR025468/ERR025468.bam'] does not exist

The run.txt file

RedDog V1beta.11 - phylogeny run

Copyright (c) 2016 David Edwards, Bernie Pope, Kat Holt
All rights reserved. (see README.txt for more details)

Mapping: Bowtie2 V2.2.9
Preset Option: --sensitive-local
1 replicon(s) in GenBank reference AP006725.1
1 replicon(s) to be reported
25 sequence pair(s) to be mapped

Output folder:
/data3/Analysis/K1_Locus_KP/SNP/RedDog_Output/

Starting pipeline...
444 jobs to be executed in total
414 jobs left to execute

Any ideas?

Thanks

@kelwyres
Copy link
Collaborator

Hi,
It sounds like there might be a problem with your input reference sequence. BioPython is saying that the sequence is shorter than indicated in the record header. You can check if the reference is complete by opening the file in a text editor and scrolling down to the end.
Kelly

@deearahman
Copy link
Author

Hi Kelly,

Thanks for you reply.

I re-downloaded the genbank file and that solved the biopython issue. But I am still getting this warning

[common@t7920 RedDog_v0.4.8]$ rubra RedDog --config k1locus_config_massive.py --style run > run.txt
Traceback (most recent call last):
File "/usr/bin/rubra", line 11, in
load_entry_point('Rubra==0.1.5', 'console_scripts', 'rubra')()
File "build/bdist.linux-x86_64/egg/rubra/rubra.py", line 66, in main
File "build/bdist.linux-x86_64/egg/ruffus/task.py", line 2671, in pipeline_run
File "build/bdist.linux-x86_64/egg/ruffus/task.py", line 2406, in fill_queue_with_job_parameters
File "build/bdist.linux-x86_64/egg/ruffus/task.py", line 2345, in parameter_generator
ruffus.ruffus_exceptions.RethrownJobError:

Exceptions generating parameters for

'def RedDog.checkBam(...):'

Original exception:

Exception #1
ruffus.ruffus_exceptions.MissingInputFileError(    
    
    
    No way to run job: Input file ['/data3/Analysis/K1_Locus_KP/SNP/RedDog_Output/temp/ERR025673/ERR025673.bam'] does not exist):
for RedDog.checkBam.

Traceback (most recent call last):
  File "build/bdist.linux-x86_64/egg/ruffus/task.py", line 2279, in parameter_generator
    check_input_files_exist (*param)
  File "build/bdist.linux-x86_64/egg/ruffus/file_name_parameters.py", line 191, in check_input_files_exist
    "Input file ['%s'] does not exist" % f)
MissingInputFileError:     
    
    
    No way to run job: Input file ['/data3/Analysis/K1_Locus_KP/SNP/RedDog_Output/temp/ERR025673/ERR025673.bam'] does not exist

Not exactly sure what went wrong.

Dyana

@kelwyres
Copy link
Collaborator

Hi Dyana,
Looks like pipeline is trying to check the bam mapping outout but can't find the file- so that indicates something went wrong with the mapping step. What sort of system are you running on? Depending how you are running the pipeline -do you have a directoy called 'log' inside your main RedDog directory and if so, are there any files in it?
We can look inside the files to get a clue.
Otherwise, first step would be to make sure that your selected mapping program (bwa or bowtie2) is installed and available in your path.
Kelly

@deearahman
Copy link
Author

deearahman commented Jan 7, 2021

Sorry for the late reply. here's a log file, there are more log files. I have tried bowtie2 and bwa it works fine.

pipeline.log

I re-installed again the RedDog_v1b11 but I keep getting this error.

[common@localhost RedDog_v0.4.8]$ rubra RedDog --config k1locus_config_massive.py --style run
Traceback (most recent call last):
File "/usr/bin/rubra", line 9, in
load_entry_point('Rubra==0.1.5', 'console_scripts', 'rubra')()
File "build/bdist.linux-x86_64/egg/rubra/rubra.py", line 35, in main
File "RedDog.py", line 55, in
from pipe_utils import (isGenbank, isFasta, chromInfoFasta, chromInfoGenbank, getValue,
File "pipe_utils.py", line 13, in
from Bio import SeqIO
ImportError: No module named Bio

I've checked that the Bio module imports fine. Need help!

Updated: I have solve the ImportError: No module named Bio issue.

@deearahman
Copy link
Author

Hi Kelly,

Just to let you know I have rectify the problem and it was due to different version of samtools that didn't allow the bam files to be generated. It is all good now. However, I noted that only files with naming such as ERR123456.fastq.gz are analysed. File with naming MDB104_S2_L001_R1_001.fastq.gz do not get processed. Is there any way around this? If not, I will have to rename it..

Thanks,
Dyana

@kelwyres
Copy link
Collaborator

Hi,
Great that you've managed to fix the SAMtools problem. Unfortunately, I don't think there is any way around the file name convention, so you'll need to rename them, or perhaps try simlinks?
Kelly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants