Skip to content

3. Running the Pipeline

d-j-e edited this page Jun 7, 2016 · 19 revisions

Running the Pipeline

WARNING 1: ONLY EXECUTE ONE RUN OF THE PIPELINE AT A TIME - ESPECIALLY WITH LARGE DATA SETS.

WARNING 2: The pipeline analysis for larger data sets can take many, many hours, so set it up on a computer you can leave running uninterrupted (you can still do other tasks on the computer, though I often turn off the screen and come back later), or make use of the screen command (see below)

Change to the RedDog directory, then at the command prompt, enter:

module load python

[Note: this manual assumes you have installed the required programs, which includes installing Ruffus and Rubra so they are available in Python]

You only need to load the module once per login session. Once the module is loaded (almost immediately), enter:

rubra RedDog --config RedDog_config 

This will print out all the stages as they will be run and is useful for confirming the details of a run before commencing it. To actually run the pipeline, enter:

rubra RedDog --config RedDog_config --style run

This will start the pipeline, and after a series of pre-run checks, the details of the run will be reported and the user asked to start the run. Once you have checked the details, hit ‘y’ to start the actual pipeline. The pipe will then launch a series of job scripts that will be sent to the job queue for processing.

e.g. Example run

RedDog V1beta.10 - phylogeny run

Copyright (c) 2015, David Edwards, Bernie Pope, Kat Holt
All rights reserved. (see README.txt for more details)

Mapping: Bowtie2 V2.2.3
Preset Option: --sensitive-local
1 replicon(s) in GenBank reference NC_000962_3
1 replicon(s) to be reported
768 sequence pair(s) to be mapped

Output folder:
/scratch/disk/workspace/mapping/v1b_date_study/

Start Pipeline? (y/n)
 

If your connection to the cluster system is broken during the run, don't panic. Just log back into a new session and rerun the above two commands again. The pipeline should restart at the stage the pipe was at before the connection was broken.

If you want a flowchart (though only svg files are currently available - open with any browser):

rubra RedDog --config RedDog_config --style flowchart

You can also run the pipeline using the 'screen' command at the command prompt. This opens a new screen connection in the same terminal session, and you will get the 'welcome' message. You will be in the same directory as when you entered 'screen'.

VLSCI users: If you do want to run the pipeline in the screen session, you will have to do the two following steps when you start a new screen session:

module purge
module load vlsci

Then change to the RedDog directory (using 'cd'), load the python module as usual, and then launch the pipeline.

You can detach from the screen by entering crtl-a d (hit 'a' while holding the 'control' key, then hit 'd' by itself). This will return you to the main session you logged on with. You can exit this (and turn off your computer and go home) and the pipeline will continue to run in the 'screen' session.

To re-attach to the session, just type in screen -r to recover the session. You can do this from any computer, but do remember to detach if you are just checking from home, for example. Screen sessions will keep running even if your connection to your host is broken for some reason. For more details on screen, including running more than one screen session and naming sessions, look at the following: screen quick_reference

There is one important caveat to using screen; it does not hold a record of the lines as it goes so you can't scroll back to examine an error for example.

Using crtl-a H (hit 'a' while holding the 'control' key, then hit 'H' by itself) during a screen session creates a running log of the session. Screen will keep appending data to the file through multiple sessions. Using the log function is very useful for capturing what happens during the run. If something goes awry, you can look back through your logs.

For a further, user-friendly guide to screen (and source of the final ‘hint’) visit:
[How To Use Linux Screen] (http://www.rackaid.com/blog/linux-screen-tutorial-and-how-to/)

If you are running the pipeline on our new system (local users only), make sure you use the config file with the ‘sg’ suffix (i.e. ‘RedDog_config_sg.py’). Using this version will make sure jobs are sent to the correct queue on ‘snowy-sg1’, the ‘sysgen’ queue. Note: at the time of writing this update, our new server has not yet been officially named, so the above information may yet change…

Previous Home Next