Skip to content

Prephix (Pre-Phrecon Input fiXer) -- This is a script that takes in a list of input files and generates one SNP loci output file (containing multiple strains), one reference base file, and a file containing the excluded indel entries.

codinghedgehog/prephix

Repository files navigation

Prephix (Pre-Phrecon Input fiXer)

Usage: prephix.pl k28.out_file1 [k28.out_file2 ...] -batchid <id> [-debug] [-quiet] [-exclude <loci_file>] [-ignore_quality]

Takes in a list of input files and generates one SNP loci output file (containing multiple strains), one reference base file,
and a file containing the excluded indel entries.

The indel file is formated as "STRAIN ID" [TAB] k28|nuc|vcf [TAB] ...excluded VAAL K28, VCF, or NUCMER line...
So prephix basically dumps the excluded lines as-is into the indel file, but prefixes each line
with the strain id and then either k28 for VAAL formatted line, nuc for NUCMER formatted line, or vcf for VCF formatted line.

The SNP loci and reference base files are suitable as input to the phrecon (Phylo Reconstructor) utility (i.e. tab
delimited columnar format: STRAIN_ID [TAB] LOCI [TAB] BASE).

The SNP loci file and indel files are suitable input for the SNP Swapper utility.

Recognized input file formats: k28.out (VAAL), NUCMER, VCF

You may mix input file types within a single run/batch.


=== ADDITIONAL SCRIPTS ===

Additional scripts exist in this repository, mainly utility and helper scripts that work on the outfiles from prehix.

--- Prephix 2 SNP Effect Converter ---

Usage: prephix2snpeffect.py ref_file snp_file

This script takes the ref and snp output files from a prephix run and generates an SNP Effect compliant input file.
SNP Effect file format is: SNPlocus<tab>sample=snpbase<tab>ref=refbase

SNP Effect doesn't care about strain ids so if two strains share a locus but have different bases, then two lines will
be generated, with identical locus and ref values but different sample values.


--- Prephix to PhenoLink Generator ---

Usage: pre2phe.py -ref <ref base file> -snp <snp file> -out <PhenoLink output file>

Takes in a one SNP loci output file (possibly containing multiple strains) and one reference base file generated from prephix,
and generates a PhenoLink export file.

The PhenoLink export file is a tab-delimited file, with columns of all Loci Names (in this case RefBase_Locus_SnpBase)
and rows of StrainIDs with 1's indicating the presence of that Locus Name in its strain, and 0 otherwise.


--- SNP Comparator ---

Usage: snp_compare.py --snpfile <snpfile> --compare <compare strain 1> <compare strain 2> ... [--exclude <exclude strain 1> <exclude strain 2> ...] [--outfile <filename>]

Takes in an SNP file (formatted like the snp file output from prephix) and a
list of strain IDs to compare and/or exclude.  SNP Comparator finds all the
common SNP Loci between the compared Strains and then removes the SNP Loci
that are common across the exclude Strains.

About

Prephix (Pre-Phrecon Input fiXer) -- This is a script that takes in a list of input files and generates one SNP loci output file (containing multiple strains), one reference base file, and a file containing the excluded indel entries.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published