Skip to content

4. Pipeline Conventions

d-j-e edited this page Mar 21, 2015 · 16 revisions

Pipeline Conventions:

Before using RedDog, it is useful to understand some of the conventions used in this pipeline.

The pipeline can map reads to a reference with more than one sequence, (hereafter these sequences are referred to as replicons). The replicons in a reference can be made up of a whole bacterial ‘chromosome’ with (or without) one or more plasmids, or a series of plasmids, bacteriophage or genes of interest, or indeed a series of contigs (e.g. a pangenome sequence). If this reference is given in Genbank format, the pipeline will also examine the depth and coverage of reads for each isolate across all the features (genes) in the Genbank file.

Where the replicons in the reference represent sequences with potentially different phylogenetic histories (e.g. a genome and one or more plasmids), the pipeline will report on each replicon in the reference separately, including calling potential outgroups – this is known as a ‘phylogeny’ run.

If the replicon(s) represents the pangenome of the collection of isolates to be tested, then only the replicon(s) with the ‘core’ genes (and SNPs) needs to be considered in phylogenetic analysis. This is usually the largest contig, but the user can specify any one replicon, or more. During such a ‘pangenome’ run, all isolates are assumed to have been used to generate the pangenome, and hence are all isolates are classified as ‘ingroup’ (i.e. there is no outgroup calling).

Mapping runs to a new reference are called a ‘new’ run; this is to distinguish them from a run where any new read sets are merged with the output from a prior run, a so called ‘merge’ run. As long as the same reference is being used, any number of merge runs can be added to the original output set. Merge runs can also include mapping reads from different platforms/technologies.

e.g.
run 1: ‘new’ with paired end Illumina data;
run 2: ‘merge’ with single end Illumina data;
run 3: ‘merge’ with more paired end reads; and,
run 4: ‘merge’ with Ion Torrent reads.

Pipeline Test Sets

The following small set of reads and reference Genbank file are recommended to use to test if the installation is working properly. The read sets are available for download at the [European Nucleotide Archive] (http://www.ebi.ac.uk/ena).
Read Sets: ERR019786, ERR019793 and ERR019794

The reference genome and plasmid are available from NCBI.
Reference: [CP000038.1 and CP00003891] (http://www.ncbi.nlm.nih.gov/nuccore/CP000038.1,CP000039.1)

Previous Home Next