lofreq2sync.py: The program converts multiple lofreq VCF files to the "synchronized" format to be compatible with https://code.google.com/archive/p/popoolation2/wikis/Manual.wiki to use Cochran-Mantel-Haenszel test and Fisher Exact test.
$ ./lofreq2sync.py -v vcf_files.csv
chro pos ref S1 S2 S3 S4 S5 S6
NC_020518.1 13774 C 0:0:0:0:0:0 0:0:0:0:0:0 0:100:0:3257:0:0 0:0:0:0:0:0 0:0:0:0:0:0 0:321:0:2718:0:0
NC_020518.1 61439 T 0:0:0:0:0:0 0:0:0:0:0:0 0:0:0:0:0:0 82:4234:0:0:0:0 0:0:0:0:0:0 0:0:0:0:0:0
NC_020518.1 61451 C 0:0:0:0:0:0 0:0:0:0:0:0 0:0:0:0:0:0 0:88:0:4122:0:0 0:0:0:0:0:0 0:0:0:0:0:0
NC_020518.1 61472 A 0:0:0:0:0:0 0:0:0:0:0:0 0:0:0:0:0:0 0:0:0:0:0:0 0:0:0:0:0:0 3519:0:0:98:0:0
NC_020518.1 61557 A 0:0:0:0:0:0 0:0:0:0:0:0 0:0:0:0:0:0 3993:0:0:101:0:0 0:0:0:0:0:0 0:0:0:0:0:0
NC_020518.1 61609 A 4678:87:0:0:0:0 4354:67:0:0:0:0 4109:62:0:0:0:0 4783:86:0:0:0:0 3512:63:0:0:0:0 0:0:0:0:0:0
gbExtractFeatures.py: After SNP calling is done using lofreq, the program can take multiple VCF files provided in a CSV format and: a) generates a tab delimited list to show which genes are affected by SNPs; and b) also annotate a genbank file with SNPs and produce secondary genbank file(s) (either single or separate) with annotated SNPs
$ ./gbExtractFeatures.py -g ../Reference/MDS42reference.gb -v vcf_files_MDS42.csv -f 0 | head -20
SAMPLE=JC7, CHROM=NC_020518.1, POS=61609, REF=A, ALT=T, INFO={'SB': 12, 'DP4': [2337, 2323, 34, 53], 'DP': 4765, 'AF': 0.018258}
SAMPLE=JC7, CHROM=NC_020518.1, POS=61621, REF=T, ALT=C, INFO={'SB': 4, 'DP4': [2369, 2375, 36, 45], 'DP': 4834, 'AF': 0.016756}
SAMPLE=JC7, CHROM=NC_020518.1, POS=120642, REF=A, ALT=G, INFO={'SB': 56, 'DP4': [2018, 2272, 84, 38], 'DP': 4414, 'AF': 0.027639} , TYPE=gene, PRODUCT=None, LOCUS_TAG=ECMDS42_RS00555, PROTEIN_ID=None, TRANSLATION=None , TYPE=CDS, PRODUCT=['dihydrolipoyllysine-residue acetyltransferase component of pyruvate dehydrogenase complex'], LOCUS_TAG=ECMDS42_RS00555, PROTEIN_ID=['WP_000963518.1'], TRANSLATION=['MAIEIKVPDIGADEVEITEILVKVGDKVEAEQSLITVEGDKASMEVPSPQAGIVKEIKVSVGDKTQTGALIMIFDSADGAADAAPAQAEEKKEAAPAAAPAAAAAKDVNVPDIGSDEVEVTEILVKVGDKVEAEQSLITVEGDKASMEVPAPFAGTVKEIKVNVGDKVSTGSLIMVFEVAGEAGAAAPAAKQEAAPAAAPAPAAGVKEVNVPDIGGDEVEVTEVMVKVGDKVAAEQSLITVEGDKASMEVPAPFAGVVKELKVNVGDKVKTGSLIMIFEVEGAAPAAAPAKQEAAAPAPAAKAEAPAAAPAAKAEGKSEFAENDAYVHATPLIRRLAREFGVNLAKVKGTGRKGRILREDVQAYVKEAIKRAEAAPAATGGGIPGMLPWPKVDFSKFGEIEEVELGRIQKISGANLSRNWVMIPHVTHFDKTDITELEAFRKQQNEEAAKRKLDVKITPVVFIMKAVAAALEQMPRFNSSLSEDGQRLTLKKYINIGVAVDTPNGLVVPVFKDVNKKGIIELSRELMTISKKARDGKLTAGEMQGGCFTISSIGGLGTTHFAPIVNAPEVAILGVSKSAMEPVWNGKEFVPRLMLPISLSFDHRVIDGADGARFITIINNTLSDIRRLVM']
SAMPLE=JC7, CHROM=NC_020518.1, POS=120645, REF=C, ALT=T, INFO={'SB': 24, 'DP4': [2027, 2248, 64, 39], 'DP': 4392, 'AF': 0.023452000000000001} , TYPE=gene, PRODUCT=None, LOCUS_TAG=ECMDS42_RS00555, PROTEIN_ID=None, TRANSLATION=None , TYPE=CDS, PRODUCT=['dihydrolipoyllysine-residue acetyltransferase component of pyruvate dehydrogenase complex'], LOCUS_TAG=ECMDS42_RS00555, PROTEIN_ID=['WP_000963518.1'], TRANSLATION=['MAIEIKVPDIGADEVEITEILVKVGDKVEAEQSLITVEGDKASMEVPSPQAGIVKEIKVSVGDKTQTGALIMIFDSADGAADAAPAQAEEKKEAAPAAAPAAAAAKDVNVPDIGSDEVEVTEILVKVGDKVEAEQSLITVEGDKASMEVPAPFAGTVKEIKVNVGDKVSTGSLIMVFEVAGEAGAAAPAAKQEAAPAAAPAPAAGVKEVNVPDIGGDEVEVTEVMVKVGDKVAAEQSLITVEGDKASMEVPAPFAGVVKELKVNVGDKVKTGSLIMIFEVEGAAPAAAPAKQEAAAPAPAAKAEAPAAAPAAKAEGKSEFAENDAYVHATPLIRRLAREFGVNLAKVKGTGRKGRILREDVQAYVKEAIKRAEAAPAATGGGIPGMLPWPKVDFSKFGEIEEVELGRIQKISGANLSRNWVMIPHVTHFDKTDITELEAFRKQQNEEAAKRKLDVKITPVVFIMKAVAAALEQMPRFNSSLSEDGQRLTLKKYINIGVAVDTPNGLVVPVFKDVNKKGIIELSRELMTISKKARDGKLTAGEMQGGCFTISSIGGLGTTHFAPIVNAPEVAILGVSKSAMEPVWNGKEFVPRLMLPISLSFDHRVIDGADGARFITIINNTLSDIRRLVM']
SAMPLE=JC7, CHROM=NC_020518.1, POS=120661, REF=T, ALT=G, INFO={'SB': 64, 'DP4': [1858, 2267, 68, 27], 'DP': 4230, 'AF': 0.022459} , TYPE=gene, PRODUCT=None, LOCUS_TAG=ECMDS42_RS00555, PROTEIN_ID=None, TRANSLATION=None , TYPE=CDS, PRODUCT=['dihydrolipoyllysine-residue acetyltransferase component of pyruvate dehydrogenase complex'], LOCUS_TAG=ECMDS42_RS00555, PROTEIN_ID=['WP_000963518.1'], TRANSLATION=['MAIEIKVPDIGADEVEITEILVKVGDKVEAEQSLITVEGDKASMEVPSPQAGIVKEIKVSVGDKTQTGALIMIFDSADGAADAAPAQAEEKKEAAPAAAPAAAAAKDVNVPDIGSDEVEVTEILVKVGDKVEAEQSLITVEGDKASMEVPAPFAGTVKEIKVNVGDKVSTGSLIMVFEVAGEAGAAAPAAKQEAAPAAAPAPAAGVKEVNVPDIGGDEVEVTEVMVKVGDKVAAEQSLITVEGDKASMEVPAPFAGVVKELKVNVGDKVKTGSLIMIFEVEGAAPAAAPAKQEAAAPAPAAKAEAPAAAPAAKAEGKSEFAENDAYVHATPLIRRLAREFGVNLAKVKGTGRKGRILREDVQAYVKEAIKRAEAAPAATGGGIPGMLPWPKVDFSKFGEIEEVELGRIQKISGANLSRNWVMIPHVTHFDKTDITELEAFRKQQNEEAAKRKLDVKITPVVFIMKAVAAALEQMPRFNSSLSEDGQRLTLKKYINIGVAVDTPNGLVVPVFKDVNKKGIIELSRELMTISKKARDGKLTAGEMQGGCFTISSIGGLGTTHFAPIVNAPEVAILGVSKSAMEPVWNGKEFVPRLMLPISLSFDHRVIDGADGARFITIINNTLSDIRRLVM']
SAMPLE=JC7, CHROM=NC_020518.1, POS=120862, REF=A, ALT=G, INFO={'SB': 12, 'DP4': [1907, 2153, 62, 48], 'DP': 4173, 'AF': 0.026360000000000001} , TYPE=gene, PRODUCT=None, LOCUS_TAG=ECMDS42_RS00555, PROTEIN_ID=None, TRANSLATION=None , TYPE=CDS, PRODUCT=['dihydrolipoyllysine-residue acetyltransferase component of pyruvate dehydrogenase complex'], LOCUS_TAG=ECMDS42_RS00555, PROTEIN_ID=['WP_000963518.1'], TRANSLATION=['MAIEIKVPDIGADEVEITEILVKVGDKVEAEQSLITVEGDKASMEVPSPQAGIVKEIKVSVGDKTQTGALIMIFDSADGAADAAPAQAEEKKEAAPAAAPAAAAAKDVNVPDIGSDEVEVTEILVKVGDKVEAEQSLITVEGDKASMEVPAPFAGTVKEIKVNVGDKVSTGSLIMVFEVAGEAGAAAPAAKQEAAPAAAPAPAAGVKEVNVPDIGGDEVEVTEVMVKVGDKVAAEQSLITVEGDKASMEVPAPFAGVVKELKVNVGDKVKTGSLIMIFEVEGAAPAAAPAKQEAAAPAPAAKAEAPAAAPAAKAEGKSEFAENDAYVHATPLIRRLAREFGVNLAKVKGTGRKGRILREDVQAYVKEAIKRAEAAPAATGGGIPGMLPWPKVDFSKFGEIEEVELGRIQKISGANLSRNWVMIPHVTHFDKTDITELEAFRKQQNEEAAKRKLDVKITPVVFIMKAVAAALEQMPRFNSSLSEDGQRLTLKKYINIGVAVDTPNGLVVPVFKDVNKKGIIELSRELMTISKKARDGKLTAGEMQGGCFTISSIGGLGTTHFAPIVNAPEVAILGVSKSAMEPVWNGKEFVPRLMLPISLSFDHRVIDGADGARFITIINNTLSDIRRLVM']
SAMPLE=JC7, CHROM=NC_020518.1, POS=120864, REF=C, ALT=T, INFO={'SB': 44, 'DP4': [1965, 2123, 9, 40], 'DP': 4142, 'AF': 0.01183} , TYPE=gene, PRODUCT=None, LOCUS_TAG=ECMDS42_RS00555, PROTEIN_ID=None, TRANSLATION=None , TYPE=CDS, PRODUCT=['dihydrolipoyllysine-residue acetyltransferase component of pyruvate dehydrogenase complex'], LOCUS_TAG=ECMDS42_RS00555, PROTEIN_ID=['WP_000963518.1'], TRANSLATION=['MAIEIKVPDIGADEVEITEILVKVGDKVEAEQSLITVEGDKASMEVPSPQAGIVKEIKVSVGDKTQTGALIMIFDSADGAADAAPAQAEEKKEAAPAAAPAAAAAKDVNVPDIGSDEVEVTEILVKVGDKVEAEQSLITVEGDKASMEVPAPFAGTVKEIKVNVGDKVSTGSLIMVFEVAGEAGAAAPAAKQEAAPAAAPAPAAGVKEVNVPDIGGDEVEVTEVMVKVGDKVAAEQSLITVEGDKASMEVPAPFAGVVKELKVNVGDKVKTGSLIMIFEVEGAAPAAAPAKQEAAAPAPAAKAEAPAAAPAAKAEGKSEFAENDAYVHATPLIRRLAREFGVNLAKVKGTGRKGRILREDVQAYVKEAIKRAEAAPAATGGGIPGMLPWPKVDFSKFGEIEEVELGRIQKISGANLSRNWVMIPHVTHFDKTDITELEAFRKQQNEEAAKRKLDVKITPVVFIMKAVAAALEQMPRFNSSLSEDGQRLTLKKYINIGVAVDTPNGLVVPVFKDVNKKGIIELSRELMTISKKARDGKLTAGEMQGGCFTISSIGGLGTTHFAPIVNAPEVAILGVSKSAMEPVWNGKEFVPRLMLPISLSFDHRVIDGADGARFITIINNTLSDIRRLVM']
SAMPLE=JC7, CHROM=NC_020518.1, POS=120900, REF=T, ALT=G, INFO={'SB': 146, 'DP4': [2013, 2073, 38, 148], 'DP': 4281, 'AF': 0.043448000000000001} , TYPE=gene, PRODUCT=None, LOCUS_TAG=ECMDS42_RS00555, PROTEIN_ID=None, TRANSLATION=None , TYPE=CDS, PRODUCT=['dihydrolipoyllysine-residue acetyltransferase component of pyruvate dehydrogenase complex'], LOCUS_TAG=ECMDS42_RS00555, PROTEIN_ID=['WP_000963518.1'], TRANSLATION=['MAIEIKVPDIGADEVEITEILVKVGDKVEAEQSLITVEGDKASMEVPSPQAGIVKEIKVSVGDKTQTGALIMIFDSADGAADAAPAQAEEKKEAAPAAAPAAAAAKDVNVPDIGSDEVEVTEILVKVGDKVEAEQSLITVEGDKASMEVPAPFAGTVKEIKVNVGDKVSTGSLIMVFEVAGEAGAAAPAAKQEAAPAAAPAPAAGVKEVNVPDIGGDEVEVTEVMVKVGDKVAAEQSLITVEGDKASMEVPAPFAGVVKELKVNVGDKVKTGSLIMIFEVEGAAPAAAPAKQEAAAPAPAAKAEAPAAAPAAKAEGKSEFAENDAYVHATPLIRRLAREFGVNLAKVKGTGRKGRILREDVQAYVKEAIKRAEAAPAATGGGIPGMLPWPKVDFSKFGEIEEVELGRIQKISGANLSRNWVMIPHVTHFDKTDITELEAFRKQQNEEAAKRKLDVKITPVVFIMKAVAAALEQMPRFNSSLSEDGQRLTLKKYINIGVAVDTPNGLVVPVFKDVNKKGIIELSRELMTISKKARDGKLTAGEMQGGCFTISSIGGLGTTHFAPIVNAPEVAILGVSKSAMEPVWNGKEFVPRLMLPISLSFDHRVIDGADGARFITIINNTLSDIRRLVM']
SAMPLE=JC7, CHROM=NC_020518.1, POS=120906, REF=T, ALT=C, INFO={'SB': 20, 'DP4': [2011, 2144, 57, 95], 'DP': 4321, 'AF': 0.035177} , TYPE=gene, PRODUCT=None, LOCUS_TAG=ECMDS42_RS00555, PROTEIN_ID=None, TRANSLATION=None , TYPE=CDS, PRODUCT=['dihydrolipoyllysine-residue acetyltransferase component of pyruvate dehydrogenase complex'], LOCUS_TAG=ECMDS42_RS00555, PROTEIN_ID=['WP_000963518.1'], TRANSLATION=['MAIEIKVPDIGADEVEITEILVKVGDKVEAEQSLITVEGDKASMEVPSPQAGIVKEIKVSVGDKTQTGALIMIFDSADGAADAAPAQAEEKKEAAPAAAPAAAAAKDVNVPDIGSDEVEVTEILVKVGDKVEAEQSLITVEGDKASMEVPAPFAGTVKEIKVNVGDKVSTGSLIMVFEVAGEAGAAAPAAKQEAAPAAAPAPAAGVKEVNVPDIGGDEVEVTEVMVKVGDKVAAEQSLITVEGDKASMEVPAPFAGVVKELKVNVGDKVKTGSLIMIFEVEGAAPAAAPAKQEAAAPAPAAKAEAPAAAPAAKAEGKSEFAENDAYVHATPLIRRLAREFGVNLAKVKGTGRKGRILREDVQAYVKEAIKRAEAAPAATGGGIPGMLPWPKVDFSKFGEIEEVELGRIQKISGANLSRNWVMIPHVTHFDKTDITELEAFRKQQNEEAAKRKLDVKITPVVFIMKAVAAALEQMPRFNSSLSEDGQRLTLKKYINIGVAVDTPNGLVVPVFKDVNKKGIIELSRELMTISKKARDGKLTAGEMQGGCFTISSIGGLGTTHFAPIVNAPEVAILGVSKSAMEPVWNGKEFVPRLMLPISLSFDHRVIDGADGARFITIINNTLSDIRRLVM']
SAMPLE=JC7, CHROM=NC_020518.1, POS=120918, REF=T, ALT=C, INFO={'SB': 87, 'DP4': [2138, 2136, 36, 109], 'DP': 4430, 'AF': 0.032731000000000003} , TYPE=gene, PRODUCT=None, LOCUS_TAG=ECMDS42_RS00555, PROTEIN_ID=None, TRANSLATION=None , TYPE=CDS, PRODUCT=['dihydrolipoyllysine-residue acetyltransferase component of pyruvate dehydrogenase complex'], LOCUS_TAG=ECMDS42_RS00555, PROTEIN_ID=['WP_000963518.1'], TRANSLATION=['MAIEIKVPDIGADEVEITEILVKVGDKVEAEQSLITVEGDKASMEVPSPQAGIVKEIKVSVGDKTQTGALIMIFDSADGAADAAPAQAEEKKEAAPAAAPAAAAAKDVNVPDIGSDEVEVTEILVKVGDKVEAEQSLITVEGDKASMEVPAPFAGTVKEIKVNVGDKVSTGSLIMVFEVAGEAGAAAPAAKQEAAPAAAPAPAAGVKEVNVPDIGGDEVEVTEVMVKVGDKVAAEQSLITVEGDKASMEVPAPFAGVVKELKVNVGDKVKTGSLIMIFEVEGAAPAAAPAKQEAAAPAPAAKAEAPAAAPAAKAEGKSEFAENDAYVHATPLIRRLAREFGVNLAKVKGTGRKGRILREDVQAYVKEAIKRAEAAPAATGGGIPGMLPWPKVDFSKFGEIEEVELGRIQKISGANLSRNWVMIPHVTHFDKTDITELEAFRKQQNEEAAKRKLDVKITPVVFIMKAVAAALEQMPRFNSSLSEDGQRLTLKKYINIGVAVDTPNGLVVPVFKDVNKKGIIELSRELMTISKKARDGKLTAGEMQGGCFTISSIGGLGTTHFAPIVNAPEVAILGVSKSAMEPVWNGKEFVPRLMLPISLSFDHRVIDGADGARFITIINNTLSDIRRLVM']
gbCompareGenes.py: Looks at genbank files and returns a presence/absence tables for genes that are present in the list of genbank files provided in a CSV format.
$ ./gbCompareGenes.py -g gb_files.csv | head -20
GENE MDS42 MG1655
carB 1 1
folA 1 1
ksgA 1 0
thiP 1 1
ddl 1 0
aceF 1 1
lpxD 1 1
metQ 1 1
rrf 1 0
fadE 1 1
frsA 1 1
cynX 1 1
lacY 1 1
lacZ 1 1
lacI 1 1
mhpA 1 1
mhpB 1 1
adrA 1 0
phoR 1 1