Skip to content

mgalardini/seer_pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 

Repository files navigation

seer_pipeline

Run seer with no headaches

SEER is a very interesting tool to run GWAS on bacterial datasets, but running it (especially on older OSes) requires using many different tools.

This pipeline allows running SEER with one go, thus reducing unnecessary headaches.

Usage

Place the two required input files (input.txt and phenotypes.txt) in the same directory as the Makefile. The input.txt file is a tab-delimited two-columns file in the format:

SAMPLE /PATH/TO/FASTA

While the phenotypes.txt file is a tab-delimited three columns file in the format:

SAMPLE SAMPLE PHENOTYPE

Once you have your files ready, simply type:

make seer

This will generate the k-mers using fsm-lite, estimate the population structure using mash and the R_mds.pl script, and then run seer and filter_seer to generate the list of significant kmers.

NOTE: the population structure final script src/mash2mat will strip the SAMPLE name to the first _ char. It is advisable then to have sample names with no underscores (or modify the script).

The significant kmers can be mapped back to the annotated assemblies (either in GFF or BED format), using the following command:

make map

This will generate two fastq files (one for positive betas, one for negative ones), align them to each assembly using bowtie2 and samtools, and then output the relevant features from the assembly using bedtools. This requires having the index files for each assembly in the ../indexes directory (generated by bowtie2-build), and the assembly in GFF format in the ../gff directory.

If you have your index files somewhere else simply type:

make map INDEXESDIR=/path/to/indexes/directory

If you wish to use assemblies in BED format just type:

make map GFFDIR=/path/to/bed/assemblies/directory GFFEXT=bed

The indexes and GFF/BED files should have the same naming scheme as the SAMPLE in the input files; assemblies should have either the .gff or .bed extension, depending on the value of your GFFEXT variable in the Makefile.

Further configuration

Any other parameter can be changed thanks to make: all those parameters are listed on the top lines of the Makefile. If you have the fsm-lite or seer binaries somewhere else other than ~/software/bin/ or ~/software/seer/ you can either edit the Makefile or type;

make seer FSMDIR=/path/to/fsm-lite/directory SEERDIR=/path/to/seer/directory

Prerequisites

  • fsm-lite
  • mash
  • python (2.7+, 3.3+)
  • pandas
  • perl
  • R (3.2+)
  • rhdf5
  • seer
  • bowtie2
  • bedtools

Copyright

Copyright (C) <2016> EMBL-European Bioinformatics Institute

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

Neither the institution name nor the name seer_pipeline can be used to endorse or promote products derived from this software without prior written permission. For written permission, please contact marco@ebi.ac.uk.

Products derived from this software may not be called seer_pipeline nor may seer_pipeline appear in their names without prior written permission of the developers. You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.