Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline not creating SNP consequences file #69

Open
spencer411 opened this issue Mar 9, 2020 · 6 comments
Open

Pipeline not creating SNP consequences file #69

spencer411 opened this issue Mar 9, 2020 · 6 comments

Comments

@spencer411
Copy link

spencer411 commented Mar 9, 2020

I am working with a large dataset and have been merging to an existing dataset over and over which up to this point has been working fine. As of my last run, I get an error stating that there is no consequences file despite the fact that my reference file is definitely a Genbank file (the same one used many times before with no problem). See the output file attached. Note that the bam files and vcf files are there and look fine... Any idea what is going on here or the best way to troubleshoot this? Thanks in advance!
slurm.rhea-04.772992.txt

@d-j-e
Copy link
Collaborator

d-j-e commented Mar 9, 2020

Good news is I don't think it's your Genbank reference that is the problem...

Can you check if the file really does not exist? (i.e. ...RedDog_output/temp/WT-200/WT-200_cns.fq)

One of the jobs in thee step before checkpoint_getMergeConsensus failed when the consensus sequence is pulled from the bam (i.e. getMergeConsensus). Have a look in the log folder for the message sizes for all the getMergeConsensus steps - the biggest one (probably) contains the error message you actually need. (i.e. why WT-200 failed that step!).

There is also a small bash script in the github that you can run to find error messages amongst the log files... though I never use it, so can't guide you beyond pointing out that it exists - errorcheck.txt

@spencer411
Copy link
Author

spencer411 commented Mar 10, 2020

Okay.... so went back and did some digging. WT-200 was part of a job from before that was killed due to a power outage, not one of the ones that I was currently running (that did not finish and merge). These have been in my "merge to" folder for a while, and I have been able to merge things to the folder before with this isolate in there (and ones like it, as WT-200 is in the folder I am merging too). That being said I am not sure why this would halt the pipeline now when it hasn't in the past (and maybe it didn't). Before I would just get a .txt error message related to isolates like WT-200 in the output folder after every merge. Both of the items trying to merge have the cns.fq files, so something else is keeping it from finishing. Looking at the error out files in the log folder, there are several that read (including getCoverage):

[mpileup] 1 samples in 1 input files
Set max per-file depth to 8000
[mpileup] 1 samples in 1 input files
Set max per-file depth to 8000

So... maybe the cns.fq error has nothing to do with my job not finishing?

@d-j-e
Copy link
Collaborator

d-j-e commented Mar 10, 2020

Add a '--style print' to your reddog command - this will print out all the jobs for each step in the pipeline, including those that have completed and those that need to run...

BTW just noticed in the manual the instructions for errorcheck.txt
"To run ‘errorcheck.txt’, first ‘cd’ to the RedDog folder with the log folder you wish to search. Then enter:
./errorcheck.txt
and the script will immediate(ly) launch."

Pretty sure WT-220 failed at the consensus step - may be due to random system error (they happen) or due to corruption of the bam. You may have to replace it - if needs be, do a Reddog run using the appropriate reads and reference, then drop the bam file from that run into your master set, replacing the old one.

@spencer411
Copy link
Author

Okay, so I ran error-check and got the following:

-bash-4.2$ ./errorcheck.txt
Checking log....
You have NO errors! YAYE!

Note that the WT-220 file (and many other problematic files) never show up in the final .csv because the pipeline was stopped abruptly while running them (using scancel or due to multiple power outages last year). I was able to rename these files and run them through again with different names, no problem. In the past, I did get a bunch of consensus warning text files after merging with the pipeline from the old problematic files, but it still worked (until now). So... I am not expecting WT-220 to work to be in my final file outputs, I am expecting to get a consensus warning file in the output folder for it. Maybe what I need to do is remove problematic isolates from the merge too folder completely?

Will try to start the pipeline back up with the --style print option and see what happens...

@spencer411
Copy link
Author

spencer411 commented Mar 11, 2020

See --style print output here. Looks like it should work fine...
slurm.rhea-05.790710.txt

Maybe I just need to increase some walltime?

@d-j-e
Copy link
Collaborator

d-j-e commented Mar 16, 2020

Yes - try doubling it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants