Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[False ->dir.makeDir.Success] #53

Closed
SabeenR opened this issue Nov 29, 2017 · 25 comments
Closed

[False ->dir.makeDir.Success] #53

SabeenR opened this issue Nov 29, 2017 · 25 comments
Assignees

Comments

@SabeenR
Copy link

SabeenR commented Nov 29, 2017

Hi,
I'm very new at this and not sure what it is that I'm doing wrong.
I think I've installed all the dependencies correctly but when I try to run a small test dataset through the pipeline it's failing.
I'm attaching the output for you. I'm sure it's because the pipeline is failing to make an output dir.
it keeps giving thew error:
Job needs update: Missing file /root/Documents/RedDog_test_output/temp/~.Success

at every stage.
Please help.

test_run1.txt

Thanks,
Sabeen.

@d-j-e
Copy link
Collaborator

d-j-e commented Nov 29, 2017

Hi Sabeen,

First question - are you running this an a local server or one with a job queue?

Second - The output you sent me is what is expected when you do a 'print' run - this will let you know which jobs need to be run (at the start, this is all of them). To get the pipe to actually run, you need to add '--style run' to the reddog command ('print' is the default). If you haven't done this, it could be the problem...

There are other possible reasons, but we should walk through this if you want to get it working locally.

cheers,
David

@d-j-e d-j-e self-assigned this Nov 30, 2017
@SabeenR
Copy link
Author

SabeenR commented Nov 30, 2017

Hi David,
Thank you for your reply.
I ran the pipeline with the the flagg --style run as you suggested and it gave me another error.
I'm attaching the output.

I understand the error code 127 means that it cannot find an executable as it's not ins PATH but I can't figure out which executable it needs to find !

Hopefully there is a simple solution to this.

Thanks
Sabeen
test_run2.txt

@d-j-e
Copy link
Collaborator

d-j-e commented Dec 1, 2017

What queuing system are you using, SLURM or PBS (qsub)?

@SabeenR
Copy link
Author

SabeenR commented Dec 1, 2017

I don't think I'm using any queing system. I've installed it on my local server and trying to run it but it isn't working. I think there is a setting that needs to be changed/fixed and I don't know what it is,

I'm running RHEL 7

@d-j-e
Copy link
Collaborator

d-j-e commented Dec 2, 2017

Ah, then the solution may be really easy... in the RedDog_config file move down until you find the following:
stageDefaults = {
"distributed": True,

and change "distributed" to False
just above that look for the following:
pipeline = {
"logDir": "log",
"logFile": "pipeline.log",
"style": "print",
"procs": 50,

Change the number of "procs" to fit within the number of processors available for RedDog on your local server - I do advise not all of them!

Make sure you don't remove the comma on either line when you edit them.
This should get it running - let me know how it goes

@SabeenR
Copy link
Author

SabeenR commented Dec 4, 2017

Hi David,
SO I tried what you suggested. I changed the "distributed" to False and
changed the number of "procs": 24,

I get a slightly different error but it's still exit code 127 (see attached file.
test_run3.txt

This is what lscpu gives me.. just to let you know what kind of system I'm trying to run this on.
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
Stepping: 2
CPU MHz: 2716.593
CPU max MHz: 3200.0000
CPU min MHz: 1200.0000
BogoMIPS: 4789.05
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K
NUMA node0 CPU(s): 0-7,16-23
NUMA node1 CPU(s): 8-15,24-31

@d-j-e
Copy link
Collaborator

d-j-e commented Dec 11, 2017

Can you send me the config file (sorry, been doing my six month report for PhD)

@SabeenR
Copy link
Author

SabeenR commented Dec 12, 2017

yes of course.. no worries,
I'm attaching the test_config.py where I've specified the reference, sequences and the output directory.

I had to save it as a txt file as this system was not allowing me to attach a python (~.py) file
test_config.txt

@d-j-e
Copy link
Collaborator

d-j-e commented Dec 13, 2017

Had a read through the config file you sent me and the changes for "distributed" and "procs" were the old settings - looks like the changes weren't saved. Have another go at editing the config file and check the changes are saved before trying the run again...

BTW if it does run properly you won't see any messages after hitting 'y' to start the run. The commands will be sent to the command line silently (as long as it keeps working)

@SabeenR
Copy link
Author

SabeenR commented Jan 3, 2018

Hi David,
Thanks for the update. I did change the "distributed" and "procs", saved it and re-checked to make sure it got saved and then ran the pipeline.

It's still nt working.
Please see the latest output attached.
test_run4.txt

@d-j-e
Copy link
Collaborator

d-j-e commented Jan 3, 2018

Hi Sabeen,

Well at least it did get to checkBam which means RedDog is working, just not all the stages yet...

Are there BAM files for the two isolates in the output bam folder? If so, what size are they, and what size are the original sequence files?

There may be a delay problem between stages (i.e. between the mapping of reads and checking that the BAM was produced) - the files may not be immediately available to RedDog so it thinks they haven't been made. One way to check this is to restart RedDog (with the same command) and see if it gets through the 'checkBam' step.

David

@SabeenR
Copy link
Author

SabeenR commented Jan 3, 2018

No BAM files in the output bam folder.
The files sizes: (see JPEG attached.
when I did a Ctrl+C it gave me this (see txt file attached).
files-sizes
test_run5.txt

@d-j-e
Copy link
Collaborator

d-j-e commented Jan 3, 2018

Have a look in the log folder (in the folder where reddog is installed): are there any 'log' files?

@SabeenR
Copy link
Author

SabeenR commented Jan 4, 2018

temp_folder

RedDog-pipeline_log.log
yes the log file is attached.
Also I'm sending you the screenshot of the only populated folder in the output directory:
pipeline.log

@SabeenR
Copy link
Author

SabeenR commented Jan 4, 2018

success_folder

the only other populated folder

@d-j-e
Copy link
Collaborator

d-j-e commented Jan 4, 2018

As I thought - running on a local server provides no output on errors... but the log file does provide a way to find out what is going wrong with the mapping.

Run the following command and report what happens:
bowtie2 --sensitive-local -x /root/Documents/RedDog_test_output/temp/GCF_000009205.1_ASM920v1_genomic -1 /home/jspinler/RedDog_Test/MS5862459-ID-wgs96-020-CD630_1.fastq.gz -2 /home/jspinler/RedDog_Test/MS5862459-ID-wgs96-020-CD630_2.fastq.gz -X 2000 | samtools view -ubS - | samtools sort - -o /root/Documents/RedDog_test_output/temp/MS5862459-ID-wgs96-020-CD630/MS5862459-ID-wgs96-020-CD630.bam

This is the smaller set of the two and should take about ten to twenty minutes to map, if it works.

@SabeenR
Copy link
Author

SabeenR commented Jan 8, 2018

Ok so it first gave a readable output (see attached) then a whole bunch of gobbledygook and froze the terminal. When I logged back into the machine again and did "top" there was no activity at all.

test_run6.txt

it did produce 2 bam files in the folder: /root/Documents/RedDog_test_output/temp/MS5862459-ID-wgs96-020-CD630/

  1. MS5862459-ID-wgs96-020-CD630.bam.0000.bam
  2. MS5862459-ID-wgs96-020-CD630.bam.0001.bam

What next ?

@SabeenR
Copy link
Author

SabeenR commented Jan 8, 2018

I removed the larger sequence pair from that folder and am re-running the pipeline:
I'll let you know what happens... if anything.


[root@tcmc_sandbox2 RedDog_v1b10_3]# rubra RedDog --config test_config --style run

RedDog V1beta.10.3 - phylogeny run

Copyright (c) 2016 David Edwards, Bernie Pope, Kat Holt
All rights reserved. (see README.txt for more details)

Mapping: Bowtie2 V2.2.9
Preset Option: --sensitive-local
2 replicon(s) in GenBank reference GCF_000009205.1_ASM920v1_genomic
2 replicon(s) to be reported
1 sequence pair(s) to be mapped

Output folder:
/root/Documents/RedDog_test_output/

Start Pipeline? (y/n) y

Starting pipeline...
47 jobs to be executed in total
41 jobs left to execute


@SabeenR
Copy link
Author

SabeenR commented Jan 8, 2018

ok that gave me the same error as before but I re-ran the previous step one more time:

here is the command I ran:

[root@tcmc_sandbox2 RedDog_v1b10_3]# bowtie2 --sensitive-local -x /root/Documents/RedDog_test_output/temp/GCF_000009205.1_ASM920v1_genomic -1 /home/jspinler/RedDog_Test/MS5862459-ID-wgs96-020-CD630_1.fastq.gz -2 /home/jspinler/RedDog_Test/MS5862459-ID-wgs96-020-CD630_2.fastq.gz -X 2000 | samtools view -ubS - | samtools sort - -o /root/Documents/RedDog_test_output/temp/MS5862459-ID-wgs96-020-CD630/MS5862459-ID-wgs96-020-CD630.bam

*********output begin
[samopen] SAM header is present: 2 sequences.
Warning: skipping mate #1 of read 'M03901:182:000000000-BGN2G:1:1105:18898:10101 1:N:0:ACCTCCAA' because length (1) <= # seed mismatches (0)
Warning: skipping mate #2 of read 'M03901:182:000000000-BGN2G:1:1105:18898:10101 1:N:0:ACCTCCAA' because length (1) <= # seed mismatches (0)
Warning: skipping mate #1 of read 'M03901:182:000000000-BGN2G:1:1105:18898:10101 1:N:0:ACCTCCAA' because it was < 2 characters long
Warning: skipping mate #2 of read 'M03901:182:000000000-BGN2G:1:1105:18898:10101 1:N:0:ACCTCCAA' because it was < 2 characters long
Warning: skipping mate #1 of read 'M03901:182:000000000-BGN2G:1:1108:15931:4189 1:N:0:ACCTCCAA' because length (1) <= # seed mismatches (0)
Warning: skipping mate #2 of read 'M03901:182:000000000-BGN2G:1:1108:15931:4189 1:N:0:ACCTCCAA' because length (1) <= # seed mismatches (0)
Warning: skipping mate #1 of read 'M03901:182:000000000-BGN2G:1:1108:15931:4189 1:N:0:ACCTCCAA' because it was < 2 characters long
Warning: skipping mate #2 of read 'M03901:182:000000000-BGN2G:1:1108:15931:4189 1:N:0:ACCTCCAA' because it was < 2 characters long
1082734 reads; of these:
1082734 (100.00%) were paired; of these:
1081434 (99.88%) aligned concordantly 0 times
1141 (0.11%) aligned concordantly exactly 1 time
159 (0.01%) aligned concordantly >1 times
----
1081434 pairs aligned concordantly 0 times; of these:
5899 (0.55%) aligned discordantly 1 time
----
1075535 pairs aligned 0 times concordantly or discordantly; of these:
2151070 mates make up the pairs; of these:
1082767 (50.34%) aligned 0 times
961114 (44.68%) aligned exactly 1 time
107189 (4.98%) aligned >1 times
50.00% overall alignment rate
[bam_sort_core] merging from 2 files...
**************end of output

I didn't see any changes in any of the folders.

@d-j-e
Copy link
Collaborator

d-j-e commented Jan 8, 2018

Can you instead try the test set as found in the reddog wiki? - I know these reads work, so makes trouble shooting the pipeline easier. Then we can look at read sets that then 'break' the pipeline.
The reads and genome can be found here https://github.com/katholt/RedDog/wiki/4.-Pipeline-Conventions#pipeline-test-sets

@SabeenR
Copy link
Author

SabeenR commented Jan 9, 2018

OK SO I ran the test data sets and I literally got the exact same issues. Same error message that ~.bam file does not exist.
This time however it only accepted CP00039.1 as reference not CP00038.1 eventhough in the config file (attached) I had put in both.
test_config.txt

Then I ran the bowtie on just one seq:

*****Command
bowtie2 --sensitive-local -x /root/Documents/RedDog_test1_output/temp/refCP000039_1 -1 /root/Documents/RedDog_test_input/ERR019786_1.fastq.gz -2 /root/Documents/RedDog_test_input/ERR019786_2.fastq.gz -X 2000 | samtools view -ubS - | samtools sort - -o /root/Documents/RedDog_test1_output/temp/ERR019786/ERR019786.bam
********************************EndOfCommand
It seems to be working at first then again spewed out a whole load of gobbledygook. All the while samstool was running. no bam file got produced as it just shut down after a few mins.

I ran this above command again and directed the output to a log file (attached).
it was huge so had to truncate it..
ERR019786_1_test.log

Also, here is the pipelog (attached)
pipeline_01-09-18.log.log

@SabeenR
Copy link
Author

SabeenR commented Jan 9, 2018

also when I ran it the second time it first output this:
Output
[root@tcmc_sandbox2 RedDog_v1b10_3]# nohup: ignoring input and redirecting stderr to stdout
[samopen] SAM header is present: 1 sequences.
[sam_read1] reference ' 697184 (89.95%) aligned concordantly 0 times' is recognized as '*'.
Parse error at line 1550136: invalid CIGAR character
***********************endOfOutput
then it just hung.. I had to do a Ctrl+C to get back the shell prompt.

@d-j-e
Copy link
Collaborator

d-j-e commented Jan 10, 2018

First of all, this is actually useful as I think I know what is going on... Bowtie2 is working and the bam is being produced, BUT it looks like samtools is not woking, and its at this step in the command the pipe looks to be failing. Can you check which version of samtools you are using? The pipe currently uses SAMtools v1.3.1.

As to the reference, reddog only maps to a single reference file, but that reference can be a single genbank or fasta file, or a concatenated set of genbank or fasta files, so you can concatenate CP00038 and CP00039 (genome and plasmid) into a single file then map to that...

@SabeenR
Copy link
Author

SabeenR commented Jan 10, 2018

OK after a bit of digging and updating my Samtools to v1.3.1 I managed to runt he whole test dataset through. My system couldnt find bcftools (I fixed that) then it couldn't find vcfutils.pl (I fixed that as well).. then when I finally ran it again it didn't give me any errors.

Right now I'm running the original test dataset (jspinler's) through the pipeline and following it on "top" I see that bowtie is only using 1 core for each instance of bowtie even though other users on the system as able to run bowtie faster.

I have 2 CPUs with 8 cores each. right now my "proc" is set to 24.. do you think I can increase that so that bowtie runs faster ?

Thanks for ALL your help ! We really do appreciate it.

@d-j-e
Copy link
Collaborator

d-j-e commented Jan 10, 2018

Hi Sabeen,
glad to hear you got it working!
Technically yes, you could run the mapping over more than one core, but if you are running lots of samples that just means there are less pairs of cores available, so the run will end up taking about the same time... but if you are running smaller sets of data, then you could edit the mapping command to use more than one core - just look in the config file for the command ('alignBowtiePE' for paired-end reads...) and add the your option.

@SabeenR SabeenR closed this as completed Jan 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants