Name		Name	Last commit message	Last commit date
parent directory ..
ADK.fas		ADK.fas
FUMC.fas		FUMC.fas
GYRB.fas		GYRB.fas
ICD.fas		ICD.fas
MDH.fas		MDH.fas
PURA.fas		PURA.fas
README.md		README.md
RECA.fas		RECA.fas
ecoli_mlst.pl		ecoli_mlst.pl
publicSTs.txt		publicSTs.txt

README.md

ecoli_mlst

ecoli_mlst is a script to determine MLST sequence types for E. coli genomes and extract allele sequences.

Synopsis
Description
Usage
Options
- Mandatory options
- Optional options
Output
Run environment
Author - contact
Citation, installation, and license
Changelog

Synopsis

perl ecoli_mlst.pl -a fas -g fasta

Description

The script searches for multilocus sequence type (MLST) alleles in E. coli genomes according to Mark Achtman's scheme with seven house-keeping genes (adk, fumC, gyrB, icd, mdh, purA, and recA) [Wirth et al., 2006]. NUCmer from the MUMmer package is used to compare the given allele sequences to bacterial genomes via nucleotide alignments.

Download the allele files (adk.fas ...) and the sequence type file ('publicSTs.txt') from this website: http://mlst.ucc.ie/mlst/dbs/Ecoli

To run ecoli_mlst.pl include all E. coli genome files (file extension e.g. 'fasta'), all allele sequence files (file extension 'fas') and 'publicSTs.txt' in the current working directory. The allele profiles are parsed from the created *.coord files and written to a result file, plus additional information from the file 'publicSTs.txt'. Also, the corresponding allele sequences (obtained from the allele input files) are concatenated for each E. coli genome into a result multi-fasta file. Option -c can be used to initiate an alignment for this multi-fasta file with ClustalW (standard alignment parameters; has to be in the $PATH or change variable $clustal_call). The alignment fasta output file can be used directly for RAxML. CAREFUL the Phylip alignment format from ClustalW allows only 10 characters per strain ID.

ecoli_mlst.pl works with complete and draft genomes. However, several genomes cannot be included in a single input file!

Obviously, only for those genomes whose allele sequences have been deposited in Achtman's allele database results can be obtained. If an allele is not found in a genome it is marked by a '?' in the result profile file and a place holder 'XXX' in the result fasta file. For these cases a manual NUCmer or BLASTN might be useful to fill the gaps and run_sub_seq.pl to get the corresponding 'new' allele sequences.

Non-NCBI fasta headers for the genome files have to have a unique ID directly following the '>' (e.g. 'Sakai', '55989' ...).

Usage

perl ecoli_mlst.pl -a fas -g fasta -c

Options

Mandatory options

-a, -alleles

File extension of the MLST allele fasta files, e.g. 'fas' (<=> -g).
-g, -genomes

File extension of the E. coli genome fasta files, e.g. 'fasta' (<=> -a).

Optional options

-h, -help

Help (perldoc POD)
-c, -clustalw

Call ClustalW for alignment

Output

ecoli_mlst_profile.txt

Tab-separated allele profiles for the E. coli genomes, plus additional info from 'publicSTs.txt'
ecoli_mlst_seq.fasta

Multi-fasta file of all concatenated allele sequences for each genome
*.coord

Text files that contain the coordinates of the NUCmer hits for each genome and allele
(errors.txt)

Error file, summarizing number of not found alleles or unclear NUCmer hits
(ecoli_mlst_seq_aln.fasta)

Optional, ClustalW alignment in Phylip format
(ecoli_mlst_seq_aln.dnd)

Optional, ClustalW alignment guide tree

Run environment

The Perl script runs only under UNIX flavors.

Author - contact

Andreas Leimbach (aleimba[at]gmx[dot]de; Microbial Genome Plasticity, Institute of Hygiene, University of Muenster)

Citation, installation, and license

For citation, installation, and license information please see the repository main README.md.

Changelog

v0.3 (30.01.2013)
- additional info in POD
- check if result files already exist and ask user what to do
- changed script name from ecoli_mlst_alleles.pl to ecoli_mlst.pl
v0.2 (20.10.2012)
- included a POD
- options with Getopt::Long
- don't consider input E. coli genome query files, which are too big (set cutoff at 9 MB for a fasta E. coli file)
- draft E. coli genomes can now be used as input query files
- additional info in 'publicSTs.txt' now associated to found ST types in output
- give text to STDOUT which files were created
- new option -c to align the resulting allele sequences via ClustalW
v0.1 (25.10.2011)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ecoli_mlst

ecoli_mlst

ADK.fas

ADK.fas

FUMC.fas

FUMC.fas

GYRB.fas

GYRB.fas

ICD.fas

ICD.fas

MDH.fas

MDH.fas

PURA.fas

PURA.fas

README.md

README.md

RECA.fas

RECA.fas

ecoli_mlst.pl

ecoli_mlst.pl

publicSTs.txt

publicSTs.txt

README.md

ecoli_mlst

Synopsis

Description

Usage

Options

Mandatory options

Optional options

Output

Run environment

Author - contact

Citation, installation, and license

Changelog

Files

ecoli_mlst

Directory actions

More options

Directory actions

More options

Latest commit

History

ecoli_mlst

Folders and files

parent directory

ecoli_mlst

Synopsis

Description

Usage

Options

Mandatory options

Optional options

Output

Run environment

Author - contact

Citation, installation, and license

Changelog