Run seer with no headaches
SEER is a very interesting tool to run GWAS on bacterial datasets, but running it (especially on older OSes) requires using many different tools.
This pipeline allows running SEER with one go, thus reducing unnecessary headaches.
Place the two required input files (input.txt
and phenotypes.txt
) in
the same directory as the Makefile. The input.txt
file is a tab-delimited
two-columns file in the format:
SAMPLE /PATH/TO/FASTA
While the phenotypes.txt
file is a tab-delimited three columns file in the format:
SAMPLE SAMPLE PHENOTYPE
Once you have your files ready, simply type:
make seer
This will generate the k-mers using fsm-lite
, estimate the population structure
using mash
and the R_mds.pl
script, and then run seer
and filter_seer
to
generate the list of significant kmers.
NOTE: the population structure final script src/mash2mat
will strip
the SAMPLE
name to the first _
char. It is advisable then to have sample
names with no underscores (or modify the script).
The significant kmers can be mapped back to the annotated assemblies (either in GFF or BED format), using the following command:
make map
This will generate two fastq files (one for positive betas, one for negative ones),
align them to each assembly using bowtie2
and samtools
, and then output the
relevant features from the assembly using bedtools
. This requires having the index
files for each assembly in the ../indexes
directory (generated by bowtie2-build
),
and the assembly in GFF format in the ../gff
directory.
If you have your index files somewhere else simply type:
make map INDEXESDIR=/path/to/indexes/directory
If you wish to use assemblies in BED format just type:
make map GFFDIR=/path/to/bed/assemblies/directory GFFEXT=bed
The indexes and GFF/BED files should have the same naming scheme as the SAMPLE in
the input files; assemblies should have either the .gff
or .bed
extension, depending
on the value of your GFFEXT
variable in the Makefile
.
Any other parameter can be changed thanks to make
: all those parameters are listed on the top
lines of the Makefile
. If you have the fsm-lite
or seer
binaries somewhere else other than
~/software/bin/
or ~/software/seer/
you can either edit the Makefile
or type;
make seer FSMDIR=/path/to/fsm-lite/directory SEERDIR=/path/to/seer/directory
- fsm-lite
- mash
- python (2.7+, 3.3+)
- pandas
- perl
- R (3.2+)
- rhdf5
- seer
- bowtie2
- bedtools
Copyright (C) <2016> EMBL-European Bioinformatics Institute
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
Neither the institution name nor the name seer_pipeline can be used to endorse or promote products derived from this software without prior written permission. For written permission, please contact marco@ebi.ac.uk.
Products derived from this software may not be called seer_pipeline nor may seer_pipeline appear in their names without prior written permission of the developers. You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.