Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error at first stage - Pipeline is not running #70

Open
kesiaeds opened this issue May 5, 2020 · 21 comments
Open

Error at first stage - Pipeline is not running #70

kesiaeds opened this issue May 5, 2020 · 21 comments

Comments

@kesiaeds
Copy link

kesiaeds commented May 5, 2020

Hi,

We are having trouble getting Red dog to run on our cluster. The pipeline does not start running and we keep getting the error below.

Thanks,
Kesia

Traceback (most recent call last):
File "/scg/apps/software/reddog/v1b11/bin/rubra", line 10, in
sys.exit(main())
File "/scg/apps/software/reddog/v1b11/lib/python2.7/site-packages/rubra/rubra.py", line 30, in main
options = getOptions(args)
File "/scg/apps/software/reddog/v1b11/lib/python2.7/site-packages/rubra/utils.py", line 210, in getOptions
imported = import(module)
File "RedDog_config_run.py", line 34
SyntaxError: Non-ASCII character '\xe2' in file RedDog_config_run.py on line 35, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

@d-j-e
Copy link
Collaborator

d-j-e commented May 5, 2020

Thought I'd removed that problem - oh well...
open the config file and delete the following from near the top - there are characters in their that upset python...

This enables the pipeline to be run as a single job (with lots of cpus) on a distributed system. 
IMPORTANT: DO NOT CHANGE SETTINGS FOR A ‘MERGE’ RUN – RedDog still checks the previous settings 
against those in the config file and will ask for confirmation of changes... 
If you are unsure of your previous settings, check the ‘Run Report’.

@kesiaeds
Copy link
Author

kesiaeds commented May 5, 2020

Hi
I had to delete this part from the config file:

'''
Notes:

'no_check' is for switching off the user check at the start of the run. 
This enables the pipeline to be run as a single job (with lots of cpus) on a distributed system. 
IMPORTANT: DO NOT CHANGE SETTINGS FOR A ‘MERGE’ RUN – RedDog still checks the previous settings 
against those in the config file and will ask for confirmation of changes... 
If you are unsure of your previous settings, check the ‘Run Report’.

'reference' and 'sequences'

Reference can be GenBank or fasta format - if GenBank format, this will be converted
to a fasta version for mapping.
 
If the GenBank file is not given, the gene cover and depth matrices for genes
will not be generated and nor will SNP consequences. 

If you don't have the GenBank record, or don't want the above matrices 
to be generated, enter a fasta format reference instead.

Ambiguous calls in the reference will cause problems for your output – 
check that the sequence uses only A, C, G, T or N (upper or lower case). 
RedDog will not run if it finds ambiguous calls (v1b11).

'''
#Test Sets
#reference = "/full_path_to/pipeline_test_sets/reference/NC_007384.gbk"
#reference = "/full_path_to/pipeline_test_sets/reference/NC_007384_with_plasmid.gbk"
#reference = "/full_path_to/pipeline_test_sets/reference/NC_007384_with_plasmid.fasta"
#sequences = "/full_path_to/pipeline_test_sets/*.fastq.gz"
#sequences = "/full_path_to/pipeline_test_sets/extra/*.fastq.gz"

# You can now also combine sequences from different folders into the same run...
#sequences = ["/full_path_to/pipeline_test_sets/*.fastq.gz", "/full_path_to/pipeline_test_sets/extra/*.fastq.gz"]


The pipeline started to run, however, I got another error message:

Starting pipeline...
70 jobs to be executed in total
Traceback (most recent call last):
File "/scg/apps/software/reddog/v1b11/bin/rubra", line 10, in
sys.exit(main())
File "/scg/apps/software/reddog/v1b11/lib/python2.7/site-packages/rubra/rubra.py", line 66, in main
gnu_make_maximal_rebuild_mode=rebuildMode)
File "/scg/apps/software/reddog/v1b11/lib/python2.7/site-packages/ruffus/task.py", line 2680, in pipeline_run
raise errt
ruffus.ruffus_exceptions.RethrownJobError:

Exceptions running jobs for

'def RedDog.makeDir(...):'

Original exception:

Exception #1
exceptions.Exception(qsub command failed with exit status: 1):
for RedDog.makeDir.Job = [False -> dir.makeDir.Success]

Traceback (most recent call last):
  File "/scg/apps/software/reddog/v1b11/lib/python2.7/site-packages/ruffus/task.py", line 517, in run_pooled_job_without_exceptions
    return_value =  job_wrapper(param, user_defined_work_func, register_cleanup, touch_files_only)
  File "/scg/apps/software/reddog/v1b11/lib/python2.7/site-packages/ruffus/task.py", line 447, in job_wrapper_io_files
    ret_val = user_defined_work_func(*param)
  File "RedDog.py", line 960, in makeDir
    runStageCheck('makeDir', flagFile, outPrefix, full_sequence_list_string)
  File "/scg/apps/software/reddog/v1b11/lib/python2.7/site-packages/rubra/utils.py", line 128, in runStageCheck
    status = runStage(stage, *args)
  File "/scg/apps/software/reddog/v1b11/lib/python2.7/site-packages/rubra/utils.py", line 144, in runStage
    exitStatus = distributedCommand(stage, commandStr, pipeline_options)
  File "/scg/apps/software/reddog/v1b11/lib/python2.7/site-packages/rubra/utils.py", line 122, in distributedCommand
    return script.runJobAndWait(stage, logDir, verbosity)
  File "/scg/apps/software/reddog/v1b11/lib/python2.7/site-packages/rubra/cluster_job.py", line 65, in runJobAndWait
    jobID = self.launch()
  File "/scg/apps/software/reddog/v1b11/lib/python2.7/site-packages/rubra/cluster_job.py", line 138, in launch
    str(returnCode)))
Exception: qsub command failed with exit status: 1

@d-j-e
Copy link
Collaborator

d-j-e commented May 5, 2020

WHat version of rubra did you download, and what cluster system are you using?

@kesiaeds
Copy link
Author

kesiaeds commented May 5, 2020

It is rubra version 0.1.5 and I'm using SCG Informatics Cluster https://login.scg.stanford.edu/

@d-j-e
Copy link
Collaborator

d-j-e commented May 5, 2020

Try the version in the branch named 'Slurm' - think this is the problem.

@kesiaeds
Copy link
Author

kesiaeds commented May 5, 2020

Sorry, do you mean I have to ask them to install another version of rubra?
This one? https://github.com/bjpop/rubra/tree/slurm

@d-j-e
Copy link
Collaborator

d-j-e commented May 5, 2020

Yes - the other version is for a qsub (torque) system, not SLurm

@kesiaeds
Copy link
Author

Hi

We installed Rubra-0.1.5 from the slurm branch as you suggested and the pipeline started to run, however, we got another error message:

`Starting pipeline...
70 jobs to be executed in total
stage = makeDir
Error: command failed: python makeDir.py /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/Red_out/ 28632_6#102,28632_6#101,28632_6#105,
stage = makeRef
Error: command failed: python convertGenbankToFasta.py /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/AL513382_1.gb /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/Red_out/temp/AL513382_1.fasta
Traceback (most recent call last):
File "/scg/apps/software/reddog/V1beta.11/bin/rubra", line 8, in
sys.exit(main())
File "/scg/apps/software/reddog/V1beta.11/rubra/rubra.py", line 66, in main
gnu_make_maximal_rebuild_mode=rebuildMode)
File "/scg/apps/software/reddog/V1beta.11/ruffus/task.py", line 5402, in pipeline_run
raise job_errors
ruffus.ruffus_exceptions.RethrownJobError:

Original exception:

Exception #1
  '<class 'ruffus.ruffus_exceptions.RethrownJobError'>
    
    Exceptions generating parameters for
    
    task = 'RedDog.indexRef'
    
Original exception:

    Exception #1
      'ruffus.ruffus_exceptions.MissingInputFileError(    
        
        
        No way to run job: Input file '/labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/Red_out/temp/AL513382_1.fasta' does not exist)' raised in ...
       Task = def RedDog.indexRef(...):
       
    
    Traceback (most recent call last):
      File "/scg/apps/software/reddog/V1beta.11/ruffus/task.py", line 4571, in parameter_generator
        job_history, verbose_abbreviated_path):
      File "/scg/apps/software/reddog/V1beta.11/ruffus/task.py", line 4410, in job_needs_to_run
        check_input_files_exist(*params)
      File "/scg/apps/software/reddog/V1beta.11/ruffus/file_name_parameters.py", line 392, in check_input_files_exist
        "Input file '%s' does not exist" % f)
    MissingInputFileError:     

No way to run job: Input file '/labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/Red_out/temp/AL513382_1.fasta' does not exist

    ' raised in ...
Traceback (most recent call last):
  File "/scg/apps/software/reddog/V1beta.11/ruffus/task.py", line 5369, in pipeline_run
    verbose)
  File "/scg/apps/software/reddog/V1beta.11/ruffus/task.py", line 4731, in fill_queue_with_job_parameters
    for params in job_parameters:
  File "/scg/apps/software/reddog/V1beta.11/ruffus/task.py", line 4671, in parameter_generator
    raise errt
RethrownJobError: 
    
    Exceptions generating parameters for
    
    task = 'RedDog.indexRef'
    
Original exception:

    Exception #1
      'ruffus.ruffus_exceptions.MissingInputFileError(    
        
        
        No way to run job: Input file '/labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/Red_out/temp/AL513382_1.fasta' does not exist)' raised in ...
       Task = def RedDog.indexRef(...):
       
    
    Traceback (most recent call last):
      File "/scg/apps/software/reddog/V1beta.11/ruffus/task.py", line 4571, in parameter_generator
        job_history, verbose_abbreviated_path):
      File "/scg/apps/software/reddog/V1beta.11/ruffus/task.py", line 4410, in job_needs_to_run
        check_input_files_exist(*params)
      File "/scg/apps/software/reddog/V1beta.11/ruffus/file_name_parameters.py", line 392, in check_input_files_exist
        "Input file '%s' does not exist" % f)
    MissingInputFileError:     
        
        
        No way to run job: Input file '/labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/Red_out/temp/AL513382_1.fasta' does not exist`

Any idea what the problem might be here?
Thanks

Kesia

@d-j-e
Copy link
Collaborator

d-j-e commented May 11, 2020

If you check the log folder, there are standard out and error files for each job - can you check to see if there is a particular error message from the makeDir step (this is the one that failed).

BTW you probably want to get rid of the #'s from your sequence names - that character (and a few other special one's) can lead to errors in many bioinformatic programs (some handle them fine...). I usually replace it with an underscore. (That is not the problem here...)

Just thought of something else really important - did you change the modules in the config file to those for your local machine?

stageDefaults = {
"distributed": True,
"walltime": "01:00:00",
"memInGB": 4,
"queue": None,
"modules": [

You will need to change these for distributed (queuing) installation
"python-gcc/2.7.5",
"bwa-intel/0.6.2",
"samtools-intel/1.3.1",
"bcftools-intel/1.2",
"eautils-gcc/1.1.2",
"bowtie2-gcc/2.2.9",
"fasttree-gcc/2.1.7dp"

]
}

You need to change the ones in bold for your local installation...

@kesiaeds
Copy link
Author

I have changed the modules in the config file for the local installation and I also removed the #'s from sequence names but I'm still getting the same error message.

Here is the message from the makeDir step file:

#!/bin/bash
module load python/2.7.18
module load bwa/0.7.17
module load samtools/1.10
module load bamtools/2.5.1
module load bcftools/1.10.2
module load ea-utils/1.04.807
module load bowtie2/2.4.1
module load fasttree/2.1.11
python makeDir.py /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/Red_out/ 28632_6_102,28632_6_105,28632_6_101,

srun --error=log/makeDir.%j.stderr --output=log/makeDir.%j.stdout --mem=4096 --job-name=makeDir --time=00:10:00 bash /oak/stanford/scg/lab_jandr/Kesia/Asia/stdy5352/Nepal/SEAP/RedDog-master/tmpM3C0B9

@d-j-e
Copy link
Collaborator

d-j-e commented May 12, 2020

the srun command sets the output files for for standard out (stdout) and standard error (stderr) with the following...

--error=log/makeDir.%j.stderr --output=log/makeDir.%j.stdout

so is there anything reported in either of those two files? (the latest ones as you have run the pipe a few times...)

@kesiaeds
Copy link
Author

These files were not created. The only files that were created in the log folder were:

makeDir.sh
makeRef.sh
pipeline.log

@d-j-e
Copy link
Collaborator

d-j-e commented May 13, 2020

try running makDir.sh via batch outside the pieline and see what happens

@kesiaeds
Copy link
Author

Hi

I had to add the following information to makeDir.sh file to run via batch outside the pipeline:

#!/bin/bash
#SBATCH --job-name=makedir
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --partition=interactive
#SBATCH --account=default
#SBATCH --time=24:00:00

Then I got this message:

python: can't open file 'makeDir.py': [Errno 2] No such file or directory

@d-j-e
Copy link
Collaborator

d-j-e commented May 26, 2020

Can you send me your config file - if you don't want to post it here, you can send it directly to me via David.Edwards @ monash.edu (remove the spaces)

@d-j-e
Copy link
Collaborator

d-j-e commented Jun 2, 2020

Apologies for the delay in responding - just got feedback on a paper very close to submission and have had to repeat some of the analysis rather quickly....

There is a second way to run redog on a distributed system as a single job - albeit with lots of processors to get the job done. Means you can't really monitor progress directly, but it also means the job only has to enter the job queue once (instead of sending lots of jobs to the queue), so can be more efficient on a crowded system.

have a look at the config file we currently use:
https://github.com/katholt/RedDog/blob/master/RedDog_config_massive.py

The changes needed to run the pipeline like this are:

  1. in Essential pipeline variables.:
    change no_check = False to no_check = True

  2. in

pipeline = {
    "logDir": "log",
    "logFile": "pipeline.log",
    "style": "print",
    "procs": 20,
    "paired": True,
    "verbose": 1,
    "end": ["deleteDir"],
    "force": [],
    "rebuild" : "fromstart"
}

set "procs" to a sensible number of processors (we are allowed up to 24, but I tend to stick to 20)

and 3. just below pipeline, in stageDefaults:
change "distributed" from True to False

Then launch the job as a single job, giving the pipeline a few hours for smaller runs - with bigger runs that can take longer than our seven day limit, I run the pipeline until it goes down, then just restart it - reddog then starts again from where it was interrupted... You will have to add all the module loading relevant for your environment, but our batch job script is as follows:

#SBATCH --job-name=RedDog
#SBATCH --account=jsXX
#SBATCH --time=1-00:00:00
#SBATCH --ntasks=1
#SBATCH --mem-per-cpu=4096
#SBATCH --cpus-per-task=20
#SBATCH --partition=comp
#SBATCH --qos=normal

module purge
module load python/2.7.15-gcc5 bwa/0.6.2 samtools/1.9-gcc5 bcftools/1.8 
source /usr/local2/bioinformatics/bioansible_env.sh
module load ea-utils/1.1.2-gcc5 fasttree/2.1.10 bowtie2/2.2.9

cd /home/[user]/path/to/RedDog_v1b_11

rubra RedDog --config RedDog_config_massive --style run

You may not need all these setting (partition is a local variable, as is qos and account which you may or may not need...)

@kesiaeds
Copy link
Author

kesiaeds commented Jun 4, 2020

Hi David,

Thank you so much for your email.

I changed my config file following your suggestions and I could run the pipeline in two samples to test and worked. However, when I tried to run again with more samples the pipeline started and them failed.

Now I'm trying to launch the job again and I'm getting this message

Traceback (most recent call last):
File "/scg/apps/software/reddog/V1beta.11/bin/rubra", line 8, in
sys.exit(main())
File "/scg/apps/software/reddog/V1beta.11/rubra/rubra.py", line 30, in main
options = getOptions(args)
File "/scg/apps/software/reddog/V1beta.11/rubra/utils.py", line 214, in getOptions
imported = import(module)
File "RedDog_config_run.py", line 156
_end-to-end mode
^
SyntaxError: invalid syntax

@kesiaeds
Copy link
Author

kesiaeds commented Jun 4, 2020

Hi David,

I download all the RedDog files again and I used the https://github.com/katholt/RedDog/blob/master/RedDog_config_massive.py as the config file.... this time the pipeline started but failed at check bam stage.

Here is the message in the log folder:
2020-06-04 00:21:56,577 - makeDir: python makeDir.py /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/Red_out_/ 28632_6_102,28632_6_105,28632_6_101, 2020-06-04 00:21:57,427 - copyRef: cp /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/red_test/AL513382_1.fasta /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/Red_out_/temp/ 2020-06-04 00:21:57,879 - buildBowtieIndex: bowtie2-build /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/Red_out_/temp/AL513382_1.fasta /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/Red_out_/temp/AL513382_1 2020-06-04 00:21:57,980 - indexRef: samtools faidx /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/Red_out_/temp/AL513382_1.fasta 2020-06-04 00:22:00,688 - alignBowtiePE: bowtie2 --sensitive-local -x /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/Red_out_/temp/AL513382_1 -1 /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/red_test/28632_6_101_1.fastq.gz -2 /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/red_test/28632_6_101_2.fastq.gz -X 2000 | samtools view -ubS - | samtools sort - -o /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/Red_out_/temp/28632_6_101/28632_6_101.bam 2020-06-04 00:22:00,788 - alignBowtiePE: bowtie2 --sensitive-local -x /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/Red_out_/temp/AL513382_1 -1 /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/red_test/28632_6_102_1.fastq.gz -2 /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/red_test/28632_6_102_2.fastq.gz -X 2000 | samtools view -ubS - | samtools sort - -o /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/Red_out_/temp/28632_6_102/28632_6_102.bam 2020-06-04 00:22:00,789 - alignBowtiePE: bowtie2 --sensitive-local -x /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/Red_out_/temp/AL513382_1 -1 /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/red_test/28632_6_105_1.fastq.gz -2 /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/red_test/28632_6_105_2.fastq.gz -X 2000 | samtools view -ubS - | samtools sort - -o /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/Red_out_/temp/28632_6_105/28632_6_105.bam 2020-06-04 00:22:02,334 - checkBam: python checkBam.py /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/Red_out_/temp/28632_6_101/28632_6_101.bam PE /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/red_test/28632_6_101_1.fastq.gz 2020-06-04 00:22:02,375 - Failed to run 'python checkBam.py /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/Red_out_/temp/28632_6_101/28632_6_101.bam PE /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/red_test/28632_6_101_1.fastq.gz' BAM too small: deleted Non-zero exit status 1 2020-06-04 00:22:02,435 - checkBam: python checkBam.py /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/Red_out_/temp/28632_6_102/28632_6_102.bam PE /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/red_test/28632_6_102_1.fastq.gz 2020-06-04 00:22:02,435 - checkBam: python checkBam.py /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/Red_out_/temp/28632_6_105/28632_6_105.bam PE /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/red_test/28632_6_105_1.fastq.gz 2020-06-04 00:22:02,476 - Failed to run 'python checkBam.py /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/Red_out_/temp/28632_6_102/28632_6_102.bam PE /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/red_test/28632_6_102_1.fastq.gz' BAM too small: deleted Non-zero exit status 1 2020-06-04 00:22:02,478 - Failed to run 'python checkBam.py /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/Red_out_/temp/28632_6_105/28632_6_105.bam PE /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/red_test/28632_6_105_1.fastq.gz' BAM too small: deleted Non-zero exit status 1

@kesiaeds
Copy link
Author

kesiaeds commented Jun 8, 2020

Hi David,
Hope you are doing well.
I download my dataset again and ran the pipeline. This time the pipeline successfully passed the check bam stage, however I am getting this message for some reads:

2020-06-08 08:38:08,732 - Failed to run 'bowtie2 --sensitive-local -x /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/Typhi/Mapping/temp/AL513382_1 -1 /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/Typhi/Typhi/33253_3#301_1.fastq.gz -2 /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/Typhi/Typhi/33253_3#301_2.fastq.gz -X 2000 | samtools view -ubS - | samtools sort - -o /labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/Typhi/Mapping/temp/33253_3#301/33253_3#301.bam' [E::hts_open_format] Failed to open file "/labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/Typhi/Mapping/temp/33253_3#301/33253_3#301.bam.tmp.0000.bam" : File exists samtools sort: failed to create temporary file "/labs/jandr/Kesia/Asia/stdy5352/Nepal/SEAP/Typhi/Mapping/temp/33253_3#301/33253_3#301.bam.tmp.0000.bam": File exists [E::bgzf_flush] File write failed (wrong size) samtools view: writing to standard output failed: Broken pipe [E::bgzf_close] File write failed samtools view: error closing standard output: -1 Non-zero exit status 1

Do you have any idea on what the problem is now?

Best
Kesia

@kesiaeds
Copy link
Author

kesiaeds commented Jun 11, 2020

Hi David,

I'm trying to run RedDog is a very large dataset >2000 genomes. Increase the walltime and memory to "walltime": "06:00:00" and "memInGB": 8, would be enough for the run?

@d-j-e
Copy link
Collaborator

d-j-e commented Jun 11, 2020

Hi Kesia,

Sorry, paper out of way (for now) so can get on to this...

You shouldn't need to up the mem to 8GB unless any of your read sets are particularly large - if I get a fail at the mapping (BAM) step, I often rerun just that stage with extra memory (see the manual/wiki for how to stop the pipeline after a particular step..). There is one other step that may require more than 4Gb latte rin the pipeline, but that also depends on the number of SNPs you need to process... (which you often don't know until you have run reddog...). So if it fails later when collating the SNP matrix (or indeed parsing the matrix), you may want to up the GB just for rerunning the failed stage(s) - the pipe finishes not long after that (by default you won't get a tree for that many isolates...), so you could runn it to the end with the higher GB, if required (hope that made sense).

As for time, that does depend on the number of cores you can give the pipe - with 20 processors and ~2000 isolates I would expect reddog to take less than a day, but does depend on the read set sizes, number of SNPs etc. So set it for 24 hours (1 day), if using 20 procs (if you are using more [lucky you - I can only get 24 max for a single job] then you will get away with less time...). Remember if the pipe runs out of time before completing, you can just restart it and it will complete what is left.

good luck!

David

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants