[False ->dir.makeDir.Success] #53

SabeenR · 2017-11-29T21:36:26Z

Hi,
I'm very new at this and not sure what it is that I'm doing wrong.
I think I've installed all the dependencies correctly but when I try to run a small test dataset through the pipeline it's failing.
I'm attaching the output for you. I'm sure it's because the pipeline is failing to make an output dir.
it keeps giving thew error:
Job needs update: Missing file /root/Documents/RedDog_test_output/temp/~.Success

at every stage.
Please help.

test_run1.txt

Thanks,
Sabeen.

d-j-e · 2017-11-29T22:51:11Z

Hi Sabeen,

First question - are you running this an a local server or one with a job queue?

Second - The output you sent me is what is expected when you do a 'print' run - this will let you know which jobs need to be run (at the start, this is all of them). To get the pipe to actually run, you need to add '--style run' to the reddog command ('print' is the default). If you haven't done this, it could be the problem...

There are other possible reasons, but we should walk through this if you want to get it working locally.

cheers,
David

SabeenR · 2017-11-30T15:40:05Z

Hi David,
Thank you for your reply.
I ran the pipeline with the the flagg --style run as you suggested and it gave me another error.
I'm attaching the output.

I understand the error code 127 means that it cannot find an executable as it's not ins PATH but I can't figure out which executable it needs to find !

Hopefully there is a simple solution to this.

Thanks
Sabeen
test_run2.txt

d-j-e · 2017-12-01T02:38:09Z

What queuing system are you using, SLURM or PBS (qsub)?

SabeenR · 2017-12-01T16:40:25Z

I don't think I'm using any queing system. I've installed it on my local server and trying to run it but it isn't working. I think there is a setting that needs to be changed/fixed and I don't know what it is,

I'm running RHEL 7

d-j-e · 2017-12-02T05:49:01Z

Ah, then the solution may be really easy... in the RedDog_config file move down until you find the following:
stageDefaults = {
"distributed": True,

and change "distributed" to False
just above that look for the following:
pipeline = {
"logDir": "log",
"logFile": "pipeline.log",
"style": "print",
"procs": 50,

Change the number of "procs" to fit within the number of processors available for RedDog on your local server - I do advise not all of them!

Make sure you don't remove the comma on either line when you edit them.
This should get it running - let me know how it goes

SabeenR · 2017-12-04T16:01:11Z

Hi David,
SO I tried what you suggested. I changed the "distributed" to False and
changed the number of "procs": 24,

I get a slightly different error but it's still exit code 127 (see attached file.
test_run3.txt

This is what lscpu gives me.. just to let you know what kind of system I'm trying to run this on.
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
Stepping: 2
CPU MHz: 2716.593
CPU max MHz: 3200.0000
CPU min MHz: 1200.0000
BogoMIPS: 4789.05
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K
NUMA node0 CPU(s): 0-7,16-23
NUMA node1 CPU(s): 8-15,24-31

d-j-e · 2017-12-11T21:03:56Z

Can you send me the config file (sorry, been doing my six month report for PhD)

SabeenR · 2017-12-12T18:46:27Z

yes of course.. no worries,
I'm attaching the test_config.py where I've specified the reference, sequences and the output directory.

I had to save it as a txt file as this system was not allowing me to attach a python (~.py) file
test_config.txt

d-j-e · 2017-12-13T06:18:42Z

Had a read through the config file you sent me and the changes for "distributed" and "procs" were the old settings - looks like the changes weren't saved. Have another go at editing the config file and check the changes are saved before trying the run again...

BTW if it does run properly you won't see any messages after hitting 'y' to start the run. The commands will be sent to the command line silently (as long as it keeps working)

SabeenR · 2018-01-03T00:47:59Z

Hi David,
Thanks for the update. I did change the "distributed" and "procs", saved it and re-checked to make sure it got saved and then ran the pipeline.

It's still nt working.
Please see the latest output attached.
test_run4.txt

d-j-e · 2018-01-03T20:32:31Z

Hi Sabeen,

Well at least it did get to checkBam which means RedDog is working, just not all the stages yet...

Are there BAM files for the two isolates in the output bam folder? If so, what size are they, and what size are the original sequence files?

There may be a delay problem between stages (i.e. between the mapping of reads and checking that the BAM was produced) - the files may not be immediately available to RedDog so it thinks they haven't been made. One way to check this is to restart RedDog (with the same command) and see if it gets through the 'checkBam' step.

David

SabeenR · 2018-01-03T21:08:57Z

No BAM files in the output bam folder.
The files sizes: (see JPEG attached.
when I did a Ctrl+C it gave me this (see txt file attached).

test_run5.txt

d-j-e · 2018-01-03T23:25:36Z

Have a look in the log folder (in the folder where reddog is installed): are there any 'log' files?

SabeenR · 2018-01-04T19:50:56Z

RedDog-pipeline_log.log
yes the log file is attached.
Also I'm sending you the screenshot of the only populated folder in the output directory:
pipeline.log

SabeenR · 2018-01-04T19:53:24Z

the only other populated folder

d-j-e · 2018-01-04T22:12:06Z

As I thought - running on a local server provides no output on errors... but the log file does provide a way to find out what is going wrong with the mapping.

Run the following command and report what happens:
bowtie2 --sensitive-local -x /root/Documents/RedDog_test_output/temp/GCF_000009205.1_ASM920v1_genomic -1 /home/jspinler/RedDog_Test/MS5862459-ID-wgs96-020-CD630_1.fastq.gz -2 /home/jspinler/RedDog_Test/MS5862459-ID-wgs96-020-CD630_2.fastq.gz -X 2000 | samtools view -ubS - | samtools sort - -o /root/Documents/RedDog_test_output/temp/MS5862459-ID-wgs96-020-CD630/MS5862459-ID-wgs96-020-CD630.bam

This is the smaller set of the two and should take about ten to twenty minutes to map, if it works.

SabeenR · 2018-01-08T21:24:45Z

Ok so it first gave a readable output (see attached) then a whole bunch of gobbledygook and froze the terminal. When I logged back into the machine again and did "top" there was no activity at all.

test_run6.txt

it did produce 2 bam files in the folder: /root/Documents/RedDog_test_output/temp/MS5862459-ID-wgs96-020-CD630/

MS5862459-ID-wgs96-020-CD630.bam.0000.bam
MS5862459-ID-wgs96-020-CD630.bam.0001.bam

What next ?

SabeenR · 2018-01-08T21:35:18Z

I removed the larger sequence pair from that folder and am re-running the pipeline:
I'll let you know what happens... if anything.

[root@tcmc_sandbox2 RedDog_v1b10_3]# rubra RedDog --config test_config --style run

RedDog V1beta.10.3 - phylogeny run

Mapping: Bowtie2 V2.2.9
Preset Option: --sensitive-local
2 replicon(s) in GenBank reference GCF_000009205.1_ASM920v1_genomic
2 replicon(s) to be reported
1 sequence pair(s) to be mapped

Output folder:
/root/Documents/RedDog_test_output/

Start Pipeline? (y/n) y

Starting pipeline...
47 jobs to be executed in total
41 jobs left to execute

SabeenR · 2018-01-08T21:54:47Z

ok that gave me the same error as before but I re-ran the previous step one more time:

here is the command I ran:

[root@tcmc_sandbox2 RedDog_v1b10_3]# bowtie2 --sensitive-local -x /root/Documents/RedDog_test_output/temp/GCF_000009205.1_ASM920v1_genomic -1 /home/jspinler/RedDog_Test/MS5862459-ID-wgs96-020-CD630_1.fastq.gz -2 /home/jspinler/RedDog_Test/MS5862459-ID-wgs96-020-CD630_2.fastq.gz -X 2000 | samtools view -ubS - | samtools sort - -o /root/Documents/RedDog_test_output/temp/MS5862459-ID-wgs96-020-CD630/MS5862459-ID-wgs96-020-CD630.bam

*********output begin
[samopen] SAM header is present: 2 sequences.
Warning: skipping mate #1 of read 'M03901:182:000000000-BGN2G:1:1105:18898:10101 1:N:0:ACCTCCAA' because length (1) <= # seed mismatches (0)
Warning: skipping mate #2 of read 'M03901:182:000000000-BGN2G:1:1105:18898:10101 1:N:0:ACCTCCAA' because length (1) <= # seed mismatches (0)
Warning: skipping mate #1 of read 'M03901:182:000000000-BGN2G:1:1105:18898:10101 1:N:0:ACCTCCAA' because it was < 2 characters long
Warning: skipping mate #2 of read 'M03901:182:000000000-BGN2G:1:1105:18898:10101 1:N:0:ACCTCCAA' because it was < 2 characters long
Warning: skipping mate #1 of read 'M03901:182:000000000-BGN2G:1:1108:15931:4189 1:N:0:ACCTCCAA' because length (1) <= # seed mismatches (0)
Warning: skipping mate #2 of read 'M03901:182:000000000-BGN2G:1:1108:15931:4189 1:N:0:ACCTCCAA' because length (1) <= # seed mismatches (0)
Warning: skipping mate #1 of read 'M03901:182:000000000-BGN2G:1:1108:15931:4189 1:N:0:ACCTCCAA' because it was < 2 characters long
Warning: skipping mate #2 of read 'M03901:182:000000000-BGN2G:1:1108:15931:4189 1:N:0:ACCTCCAA' because it was < 2 characters long
1082734 reads; of these:
1082734 (100.00%) were paired; of these:
1081434 (99.88%) aligned concordantly 0 times
1141 (0.11%) aligned concordantly exactly 1 time
159 (0.01%) aligned concordantly >1 times
----
1081434 pairs aligned concordantly 0 times; of these:
5899 (0.55%) aligned discordantly 1 time
----
1075535 pairs aligned 0 times concordantly or discordantly; of these:
2151070 mates make up the pairs; of these:
1082767 (50.34%) aligned 0 times
961114 (44.68%) aligned exactly 1 time
107189 (4.98%) aligned >1 times
50.00% overall alignment rate
[bam_sort_core] merging from 2 files...
**************end of output

I didn't see any changes in any of the folders.

d-j-e · 2018-01-08T23:08:15Z

Can you instead try the test set as found in the reddog wiki? - I know these reads work, so makes trouble shooting the pipeline easier. Then we can look at read sets that then 'break' the pipeline.
The reads and genome can be found here https://github.com/katholt/RedDog/wiki/4.-Pipeline-Conventions#pipeline-test-sets

SabeenR · 2018-01-09T21:01:31Z

OK SO I ran the test data sets and I literally got the exact same issues. Same error message that ~.bam file does not exist.
This time however it only accepted CP00039.1 as reference not CP00038.1 eventhough in the config file (attached) I had put in both.
test_config.txt

Then I ran the bowtie on just one seq:

*****Command
bowtie2 --sensitive-local -x /root/Documents/RedDog_test1_output/temp/refCP000039_1 -1 /root/Documents/RedDog_test_input/ERR019786_1.fastq.gz -2 /root/Documents/RedDog_test_input/ERR019786_2.fastq.gz -X 2000 | samtools view -ubS - | samtools sort - -o /root/Documents/RedDog_test1_output/temp/ERR019786/ERR019786.bam
********************************EndOfCommand
It seems to be working at first then again spewed out a whole load of gobbledygook. All the while samstool was running. no bam file got produced as it just shut down after a few mins.

I ran this above command again and directed the output to a log file (attached).
it was huge so had to truncate it..
ERR019786_1_test.log

Also, here is the pipelog (attached)
pipeline_01-09-18.log.log

SabeenR · 2018-01-09T21:03:07Z

also when I ran it the second time it first output this:
Output
[root@tcmc_sandbox2 RedDog_v1b10_3]# nohup: ignoring input and redirecting stderr to stdout
[samopen] SAM header is present: 1 sequences.
[sam_read1] reference ' 697184 (89.95%) aligned concordantly 0 times' is recognized as '*'.
Parse error at line 1550136: invalid CIGAR character
***********************endOfOutput
then it just hung.. I had to do a Ctrl+C to get back the shell prompt.

d-j-e · 2018-01-10T00:00:05Z

First of all, this is actually useful as I think I know what is going on... Bowtie2 is working and the bam is being produced, BUT it looks like samtools is not woking, and its at this step in the command the pipe looks to be failing. Can you check which version of samtools you are using? The pipe currently uses SAMtools v1.3.1.

As to the reference, reddog only maps to a single reference file, but that reference can be a single genbank or fasta file, or a concatenated set of genbank or fasta files, so you can concatenate CP00038 and CP00039 (genome and plasmid) into a single file then map to that...

SabeenR · 2018-01-10T18:31:57Z

OK after a bit of digging and updating my Samtools to v1.3.1 I managed to runt he whole test dataset through. My system couldnt find bcftools (I fixed that) then it couldn't find vcfutils.pl (I fixed that as well).. then when I finally ran it again it didn't give me any errors.

Right now I'm running the original test dataset (jspinler's) through the pipeline and following it on "top" I see that bowtie is only using 1 core for each instance of bowtie even though other users on the system as able to run bowtie faster.

I have 2 CPUs with 8 cores each. right now my "proc" is set to 24.. do you think I can increase that so that bowtie runs faster ?

Thanks for ALL your help ! We really do appreciate it.

d-j-e · 2018-01-10T23:50:30Z

Hi Sabeen,
glad to hear you got it working!
Technically yes, you could run the mapping over more than one core, but if you are running lots of samples that just means there are less pairs of cores available, so the run will end up taking about the same time... but if you are running smaller sets of data, then you could edit the mapping command to use more than one core - just look in the config file for the command ('alignBowtiePE' for paired-end reads...) and add the your option.

d-j-e self-assigned this Nov 30, 2017

SabeenR closed this as completed Jan 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[False ->dir.makeDir.Success] #53

[False ->dir.makeDir.Success] #53

SabeenR commented Nov 29, 2017

d-j-e commented Nov 29, 2017

SabeenR commented Nov 30, 2017

d-j-e commented Dec 1, 2017

SabeenR commented Dec 1, 2017

d-j-e commented Dec 2, 2017

SabeenR commented Dec 4, 2017

d-j-e commented Dec 11, 2017

SabeenR commented Dec 12, 2017

d-j-e commented Dec 13, 2017 •

edited

SabeenR commented Jan 3, 2018

d-j-e commented Jan 3, 2018

SabeenR commented Jan 3, 2018

d-j-e commented Jan 3, 2018

SabeenR commented Jan 4, 2018

SabeenR commented Jan 4, 2018

d-j-e commented Jan 4, 2018

SabeenR commented Jan 8, 2018

SabeenR commented Jan 8, 2018

SabeenR commented Jan 8, 2018

d-j-e commented Jan 8, 2018

SabeenR commented Jan 9, 2018

SabeenR commented Jan 9, 2018

d-j-e commented Jan 10, 2018

SabeenR commented Jan 10, 2018

d-j-e commented Jan 10, 2018

[False ->dir.makeDir.Success] #53

[False ->dir.makeDir.Success] #53

Comments

SabeenR commented Nov 29, 2017

d-j-e commented Nov 29, 2017

SabeenR commented Nov 30, 2017

d-j-e commented Dec 1, 2017

SabeenR commented Dec 1, 2017

d-j-e commented Dec 2, 2017

SabeenR commented Dec 4, 2017

d-j-e commented Dec 11, 2017

SabeenR commented Dec 12, 2017

d-j-e commented Dec 13, 2017 • edited

SabeenR commented Jan 3, 2018

d-j-e commented Jan 3, 2018

SabeenR commented Jan 3, 2018

d-j-e commented Jan 3, 2018

SabeenR commented Jan 4, 2018

SabeenR commented Jan 4, 2018

d-j-e commented Jan 4, 2018

SabeenR commented Jan 8, 2018

SabeenR commented Jan 8, 2018

SabeenR commented Jan 8, 2018

d-j-e commented Jan 8, 2018

SabeenR commented Jan 9, 2018

SabeenR commented Jan 9, 2018

d-j-e commented Jan 10, 2018

SabeenR commented Jan 10, 2018

d-j-e commented Jan 10, 2018

d-j-e commented Dec 13, 2017 •

edited