Skip to content

umerijaz/SNPCalling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Scripts for SNP Calling

lofreq2sync.py: The program converts multiple lofreq VCF files to the "synchronized" format to be compatible with https://code.google.com/archive/p/popoolation2/wikis/Manual.wiki to use Cochran-Mantel-Haenszel test and Fisher Exact test.

$ ./lofreq2sync.py -v vcf_files.csv
chro    pos    ref    S1    S2    S3    S4    S5    S6
NC_020518.1    13774    C    0:0:0:0:0:0    0:0:0:0:0:0    0:100:0:3257:0:0    0:0:0:0:0:0    0:0:0:0:0:0    0:321:0:2718:0:0
NC_020518.1    61439    T    0:0:0:0:0:0    0:0:0:0:0:0    0:0:0:0:0:0    82:4234:0:0:0:0    0:0:0:0:0:0    0:0:0:0:0:0
NC_020518.1    61451    C    0:0:0:0:0:0    0:0:0:0:0:0    0:0:0:0:0:0    0:88:0:4122:0:0    0:0:0:0:0:0    0:0:0:0:0:0
NC_020518.1    61472    A    0:0:0:0:0:0    0:0:0:0:0:0    0:0:0:0:0:0    0:0:0:0:0:0    0:0:0:0:0:0    3519:0:0:98:0:0
NC_020518.1    61557    A    0:0:0:0:0:0    0:0:0:0:0:0    0:0:0:0:0:0    3993:0:0:101:0:0    0:0:0:0:0:0    0:0:0:0:0:0
NC_020518.1    61609    A    4678:87:0:0:0:0    4354:67:0:0:0:0    4109:62:0:0:0:0    4783:86:0:0:0:0    3512:63:0:0:0:0    0:0:0:0:0:0

gbExtractFeatures.py: After SNP calling is done using lofreq, the program can take multiple VCF files provided in a CSV format and: a) generates a tab delimited list to show which genes are affected by SNPs; and b) also annotate a genbank file with SNPs and produce secondary genbank file(s) (either single or separate) with annotated SNPs

$ ./gbExtractFeatures.py -g ../Reference/MDS42reference.gb -v vcf_files_MDS42.csv -f 0 | head -20
SAMPLE=JC7,  CHROM=NC_020518.1,  POS=61609,  REF=A,  ALT=T,  INFO={'SB': 12, 'DP4': [2337, 2323, 34, 53], 'DP': 4765, 'AF': 0.018258}
SAMPLE=JC7,  CHROM=NC_020518.1,  POS=61621,  REF=T,  ALT=C,  INFO={'SB': 4, 'DP4': [2369, 2375, 36, 45], 'DP': 4834, 'AF': 0.016756}
SAMPLE=JC7,  CHROM=NC_020518.1,  POS=120642,  REF=A,  ALT=G,  INFO={'SB': 56, 'DP4': [2018, 2272, 84, 38], 'DP': 4414, 'AF': 0.027639} ,  TYPE=gene,  PRODUCT=None,  LOCUS_TAG=ECMDS42_RS00555,  PROTEIN_ID=None,  TRANSLATION=None ,  TYPE=CDS,  PRODUCT=['dihydrolipoyllysine-residue acetyltransferase component of pyruvate dehydrogenase complex'],  LOCUS_TAG=ECMDS42_RS00555,  PROTEIN_ID=['WP_000963518.1'],  TRANSLATION=['MAIEIKVPDIGADEVEITEILVKVGDKVEAEQSLITVEGDKASMEVPSPQAGIVKEIKVSVGDKTQTGALIMIFDSADGAADAAPAQAEEKKEAAPAAAPAAAAAKDVNVPDIGSDEVEVTEILVKVGDKVEAEQSLITVEGDKASMEVPAPFAGTVKEIKVNVGDKVSTGSLIMVFEVAGEAGAAAPAAKQEAAPAAAPAPAAGVKEVNVPDIGGDEVEVTEVMVKVGDKVAAEQSLITVEGDKASMEVPAPFAGVVKELKVNVGDKVKTGSLIMIFEVEGAAPAAAPAKQEAAAPAPAAKAEAPAAAPAAKAEGKSEFAENDAYVHATPLIRRLAREFGVNLAKVKGTGRKGRILREDVQAYVKEAIKRAEAAPAATGGGIPGMLPWPKVDFSKFGEIEEVELGRIQKISGANLSRNWVMIPHVTHFDKTDITELEAFRKQQNEEAAKRKLDVKITPVVFIMKAVAAALEQMPRFNSSLSEDGQRLTLKKYINIGVAVDTPNGLVVPVFKDVNKKGIIELSRELMTISKKARDGKLTAGEMQGGCFTISSIGGLGTTHFAPIVNAPEVAILGVSKSAMEPVWNGKEFVPRLMLPISLSFDHRVIDGADGARFITIINNTLSDIRRLVM']
SAMPLE=JC7,  CHROM=NC_020518.1,  POS=120645,  REF=C,  ALT=T,  INFO={'SB': 24, 'DP4': [2027, 2248, 64, 39], 'DP': 4392, 'AF': 0.023452000000000001} ,  TYPE=gene,  PRODUCT=None,  LOCUS_TAG=ECMDS42_RS00555,  PROTEIN_ID=None,  TRANSLATION=None ,  TYPE=CDS,  PRODUCT=['dihydrolipoyllysine-residue acetyltransferase component of pyruvate dehydrogenase complex'],  LOCUS_TAG=ECMDS42_RS00555,  PROTEIN_ID=['WP_000963518.1'],  TRANSLATION=['MAIEIKVPDIGADEVEITEILVKVGDKVEAEQSLITVEGDKASMEVPSPQAGIVKEIKVSVGDKTQTGALIMIFDSADGAADAAPAQAEEKKEAAPAAAPAAAAAKDVNVPDIGSDEVEVTEILVKVGDKVEAEQSLITVEGDKASMEVPAPFAGTVKEIKVNVGDKVSTGSLIMVFEVAGEAGAAAPAAKQEAAPAAAPAPAAGVKEVNVPDIGGDEVEVTEVMVKVGDKVAAEQSLITVEGDKASMEVPAPFAGVVKELKVNVGDKVKTGSLIMIFEVEGAAPAAAPAKQEAAAPAPAAKAEAPAAAPAAKAEGKSEFAENDAYVHATPLIRRLAREFGVNLAKVKGTGRKGRILREDVQAYVKEAIKRAEAAPAATGGGIPGMLPWPKVDFSKFGEIEEVELGRIQKISGANLSRNWVMIPHVTHFDKTDITELEAFRKQQNEEAAKRKLDVKITPVVFIMKAVAAALEQMPRFNSSLSEDGQRLTLKKYINIGVAVDTPNGLVVPVFKDVNKKGIIELSRELMTISKKARDGKLTAGEMQGGCFTISSIGGLGTTHFAPIVNAPEVAILGVSKSAMEPVWNGKEFVPRLMLPISLSFDHRVIDGADGARFITIINNTLSDIRRLVM']
SAMPLE=JC7,  CHROM=NC_020518.1,  POS=120661,  REF=T,  ALT=G,  INFO={'SB': 64, 'DP4': [1858, 2267, 68, 27], 'DP': 4230, 'AF': 0.022459} ,  TYPE=gene,  PRODUCT=None,  LOCUS_TAG=ECMDS42_RS00555,  PROTEIN_ID=None,  TRANSLATION=None ,  TYPE=CDS,  PRODUCT=['dihydrolipoyllysine-residue acetyltransferase component of pyruvate dehydrogenase complex'],  LOCUS_TAG=ECMDS42_RS00555,  PROTEIN_ID=['WP_000963518.1'],  TRANSLATION=['MAIEIKVPDIGADEVEITEILVKVGDKVEAEQSLITVEGDKASMEVPSPQAGIVKEIKVSVGDKTQTGALIMIFDSADGAADAAPAQAEEKKEAAPAAAPAAAAAKDVNVPDIGSDEVEVTEILVKVGDKVEAEQSLITVEGDKASMEVPAPFAGTVKEIKVNVGDKVSTGSLIMVFEVAGEAGAAAPAAKQEAAPAAAPAPAAGVKEVNVPDIGGDEVEVTEVMVKVGDKVAAEQSLITVEGDKASMEVPAPFAGVVKELKVNVGDKVKTGSLIMIFEVEGAAPAAAPAKQEAAAPAPAAKAEAPAAAPAAKAEGKSEFAENDAYVHATPLIRRLAREFGVNLAKVKGTGRKGRILREDVQAYVKEAIKRAEAAPAATGGGIPGMLPWPKVDFSKFGEIEEVELGRIQKISGANLSRNWVMIPHVTHFDKTDITELEAFRKQQNEEAAKRKLDVKITPVVFIMKAVAAALEQMPRFNSSLSEDGQRLTLKKYINIGVAVDTPNGLVVPVFKDVNKKGIIELSRELMTISKKARDGKLTAGEMQGGCFTISSIGGLGTTHFAPIVNAPEVAILGVSKSAMEPVWNGKEFVPRLMLPISLSFDHRVIDGADGARFITIINNTLSDIRRLVM']
SAMPLE=JC7,  CHROM=NC_020518.1,  POS=120862,  REF=A,  ALT=G,  INFO={'SB': 12, 'DP4': [1907, 2153, 62, 48], 'DP': 4173, 'AF': 0.026360000000000001} ,  TYPE=gene,  PRODUCT=None,  LOCUS_TAG=ECMDS42_RS00555,  PROTEIN_ID=None,  TRANSLATION=None ,  TYPE=CDS,  PRODUCT=['dihydrolipoyllysine-residue acetyltransferase component of pyruvate dehydrogenase complex'],  LOCUS_TAG=ECMDS42_RS00555,  PROTEIN_ID=['WP_000963518.1'],  TRANSLATION=['MAIEIKVPDIGADEVEITEILVKVGDKVEAEQSLITVEGDKASMEVPSPQAGIVKEIKVSVGDKTQTGALIMIFDSADGAADAAPAQAEEKKEAAPAAAPAAAAAKDVNVPDIGSDEVEVTEILVKVGDKVEAEQSLITVEGDKASMEVPAPFAGTVKEIKVNVGDKVSTGSLIMVFEVAGEAGAAAPAAKQEAAPAAAPAPAAGVKEVNVPDIGGDEVEVTEVMVKVGDKVAAEQSLITVEGDKASMEVPAPFAGVVKELKVNVGDKVKTGSLIMIFEVEGAAPAAAPAKQEAAAPAPAAKAEAPAAAPAAKAEGKSEFAENDAYVHATPLIRRLAREFGVNLAKVKGTGRKGRILREDVQAYVKEAIKRAEAAPAATGGGIPGMLPWPKVDFSKFGEIEEVELGRIQKISGANLSRNWVMIPHVTHFDKTDITELEAFRKQQNEEAAKRKLDVKITPVVFIMKAVAAALEQMPRFNSSLSEDGQRLTLKKYINIGVAVDTPNGLVVPVFKDVNKKGIIELSRELMTISKKARDGKLTAGEMQGGCFTISSIGGLGTTHFAPIVNAPEVAILGVSKSAMEPVWNGKEFVPRLMLPISLSFDHRVIDGADGARFITIINNTLSDIRRLVM']
SAMPLE=JC7,  CHROM=NC_020518.1,  POS=120864,  REF=C,  ALT=T,  INFO={'SB': 44, 'DP4': [1965, 2123, 9, 40], 'DP': 4142, 'AF': 0.01183} ,  TYPE=gene,  PRODUCT=None,  LOCUS_TAG=ECMDS42_RS00555,  PROTEIN_ID=None,  TRANSLATION=None ,  TYPE=CDS,  PRODUCT=['dihydrolipoyllysine-residue acetyltransferase component of pyruvate dehydrogenase complex'],  LOCUS_TAG=ECMDS42_RS00555,  PROTEIN_ID=['WP_000963518.1'],  TRANSLATION=['MAIEIKVPDIGADEVEITEILVKVGDKVEAEQSLITVEGDKASMEVPSPQAGIVKEIKVSVGDKTQTGALIMIFDSADGAADAAPAQAEEKKEAAPAAAPAAAAAKDVNVPDIGSDEVEVTEILVKVGDKVEAEQSLITVEGDKASMEVPAPFAGTVKEIKVNVGDKVSTGSLIMVFEVAGEAGAAAPAAKQEAAPAAAPAPAAGVKEVNVPDIGGDEVEVTEVMVKVGDKVAAEQSLITVEGDKASMEVPAPFAGVVKELKVNVGDKVKTGSLIMIFEVEGAAPAAAPAKQEAAAPAPAAKAEAPAAAPAAKAEGKSEFAENDAYVHATPLIRRLAREFGVNLAKVKGTGRKGRILREDVQAYVKEAIKRAEAAPAATGGGIPGMLPWPKVDFSKFGEIEEVELGRIQKISGANLSRNWVMIPHVTHFDKTDITELEAFRKQQNEEAAKRKLDVKITPVVFIMKAVAAALEQMPRFNSSLSEDGQRLTLKKYINIGVAVDTPNGLVVPVFKDVNKKGIIELSRELMTISKKARDGKLTAGEMQGGCFTISSIGGLGTTHFAPIVNAPEVAILGVSKSAMEPVWNGKEFVPRLMLPISLSFDHRVIDGADGARFITIINNTLSDIRRLVM']
SAMPLE=JC7,  CHROM=NC_020518.1,  POS=120900,  REF=T,  ALT=G,  INFO={'SB': 146, 'DP4': [2013, 2073, 38, 148], 'DP': 4281, 'AF': 0.043448000000000001} ,  TYPE=gene,  PRODUCT=None,  LOCUS_TAG=ECMDS42_RS00555,  PROTEIN_ID=None,  TRANSLATION=None ,  TYPE=CDS,  PRODUCT=['dihydrolipoyllysine-residue acetyltransferase component of pyruvate dehydrogenase complex'],  LOCUS_TAG=ECMDS42_RS00555,  PROTEIN_ID=['WP_000963518.1'],  TRANSLATION=['MAIEIKVPDIGADEVEITEILVKVGDKVEAEQSLITVEGDKASMEVPSPQAGIVKEIKVSVGDKTQTGALIMIFDSADGAADAAPAQAEEKKEAAPAAAPAAAAAKDVNVPDIGSDEVEVTEILVKVGDKVEAEQSLITVEGDKASMEVPAPFAGTVKEIKVNVGDKVSTGSLIMVFEVAGEAGAAAPAAKQEAAPAAAPAPAAGVKEVNVPDIGGDEVEVTEVMVKVGDKVAAEQSLITVEGDKASMEVPAPFAGVVKELKVNVGDKVKTGSLIMIFEVEGAAPAAAPAKQEAAAPAPAAKAEAPAAAPAAKAEGKSEFAENDAYVHATPLIRRLAREFGVNLAKVKGTGRKGRILREDVQAYVKEAIKRAEAAPAATGGGIPGMLPWPKVDFSKFGEIEEVELGRIQKISGANLSRNWVMIPHVTHFDKTDITELEAFRKQQNEEAAKRKLDVKITPVVFIMKAVAAALEQMPRFNSSLSEDGQRLTLKKYINIGVAVDTPNGLVVPVFKDVNKKGIIELSRELMTISKKARDGKLTAGEMQGGCFTISSIGGLGTTHFAPIVNAPEVAILGVSKSAMEPVWNGKEFVPRLMLPISLSFDHRVIDGADGARFITIINNTLSDIRRLVM']
SAMPLE=JC7,  CHROM=NC_020518.1,  POS=120906,  REF=T,  ALT=C,  INFO={'SB': 20, 'DP4': [2011, 2144, 57, 95], 'DP': 4321, 'AF': 0.035177} ,  TYPE=gene,  PRODUCT=None,  LOCUS_TAG=ECMDS42_RS00555,  PROTEIN_ID=None,  TRANSLATION=None ,  TYPE=CDS,  PRODUCT=['dihydrolipoyllysine-residue acetyltransferase component of pyruvate dehydrogenase complex'],  LOCUS_TAG=ECMDS42_RS00555,  PROTEIN_ID=['WP_000963518.1'],  TRANSLATION=['MAIEIKVPDIGADEVEITEILVKVGDKVEAEQSLITVEGDKASMEVPSPQAGIVKEIKVSVGDKTQTGALIMIFDSADGAADAAPAQAEEKKEAAPAAAPAAAAAKDVNVPDIGSDEVEVTEILVKVGDKVEAEQSLITVEGDKASMEVPAPFAGTVKEIKVNVGDKVSTGSLIMVFEVAGEAGAAAPAAKQEAAPAAAPAPAAGVKEVNVPDIGGDEVEVTEVMVKVGDKVAAEQSLITVEGDKASMEVPAPFAGVVKELKVNVGDKVKTGSLIMIFEVEGAAPAAAPAKQEAAAPAPAAKAEAPAAAPAAKAEGKSEFAENDAYVHATPLIRRLAREFGVNLAKVKGTGRKGRILREDVQAYVKEAIKRAEAAPAATGGGIPGMLPWPKVDFSKFGEIEEVELGRIQKISGANLSRNWVMIPHVTHFDKTDITELEAFRKQQNEEAAKRKLDVKITPVVFIMKAVAAALEQMPRFNSSLSEDGQRLTLKKYINIGVAVDTPNGLVVPVFKDVNKKGIIELSRELMTISKKARDGKLTAGEMQGGCFTISSIGGLGTTHFAPIVNAPEVAILGVSKSAMEPVWNGKEFVPRLMLPISLSFDHRVIDGADGARFITIINNTLSDIRRLVM']
SAMPLE=JC7,  CHROM=NC_020518.1,  POS=120918,  REF=T,  ALT=C,  INFO={'SB': 87, 'DP4': [2138, 2136, 36, 109], 'DP': 4430, 'AF': 0.032731000000000003} ,  TYPE=gene,  PRODUCT=None,  LOCUS_TAG=ECMDS42_RS00555,  PROTEIN_ID=None,  TRANSLATION=None ,  TYPE=CDS,  PRODUCT=['dihydrolipoyllysine-residue acetyltransferase component of pyruvate dehydrogenase complex'],  LOCUS_TAG=ECMDS42_RS00555,  PROTEIN_ID=['WP_000963518.1'],  TRANSLATION=['MAIEIKVPDIGADEVEITEILVKVGDKVEAEQSLITVEGDKASMEVPSPQAGIVKEIKVSVGDKTQTGALIMIFDSADGAADAAPAQAEEKKEAAPAAAPAAAAAKDVNVPDIGSDEVEVTEILVKVGDKVEAEQSLITVEGDKASMEVPAPFAGTVKEIKVNVGDKVSTGSLIMVFEVAGEAGAAAPAAKQEAAPAAAPAPAAGVKEVNVPDIGGDEVEVTEVMVKVGDKVAAEQSLITVEGDKASMEVPAPFAGVVKELKVNVGDKVKTGSLIMIFEVEGAAPAAAPAKQEAAAPAPAAKAEAPAAAPAAKAEGKSEFAENDAYVHATPLIRRLAREFGVNLAKVKGTGRKGRILREDVQAYVKEAIKRAEAAPAATGGGIPGMLPWPKVDFSKFGEIEEVELGRIQKISGANLSRNWVMIPHVTHFDKTDITELEAFRKQQNEEAAKRKLDVKITPVVFIMKAVAAALEQMPRFNSSLSEDGQRLTLKKYINIGVAVDTPNGLVVPVFKDVNKKGIIELSRELMTISKKARDGKLTAGEMQGGCFTISSIGGLGTTHFAPIVNAPEVAILGVSKSAMEPVWNGKEFVPRLMLPISLSFDHRVIDGADGARFITIINNTLSDIRRLVM']

gbCompareGenes.py: Looks at genbank files and returns a presence/absence tables for genes that are present in the list of genbank files provided in a CSV format.

$  ./gbCompareGenes.py -g gb_files.csv | head -20
GENE    MDS42    MG1655
carB     1     1
folA     1     1
ksgA     1     0
thiP     1     1
ddl     1     0
aceF     1     1
lpxD     1     1
metQ     1     1
rrf     1     0
fadE     1     1
frsA     1     1
cynX     1     1
lacY     1     1
lacZ     1     1
lacI     1     1
mhpA     1     1
mhpB     1     1
adrA     1     0
phoR     1     1 

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages