Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snp-sites for all ~2 million SARS-COV-2 genomes in GISAID #104

Open
jielab opened this issue Aug 2, 2021 · 1 comment
Open

snp-sites for all ~2 million SARS-COV-2 genomes in GISAID #104

jielab opened this issue Aug 2, 2021 · 1 comment

Comments

@jielab
Copy link

jielab commented Aug 2, 2021

Dear Andrew:

Can I use snp-sites to process the full FASTA data that I downloaded from GISAID, which has about 1,000,000,000 lines in total for ~2 million SARS-COV-2 genomes?

I run it on my local laptop and the job got killed. I could try to run it on a server. But I would like to confirm with your first that it is something doable. I guess that I only need to run "snp-sites -vp -o output " to output a VCF file. I should NOT specify "-p" because generating a phylip file for ~2 million genomes might take forever.

BTW, I had my PhD study at the Sanger Insitute, from 2012-2015.

Best regadrs,
Jie

@Salvobioinfo
Copy link

I’d like to use this tool for the same reason, but I’m scare that if the tool calls the SNPs using internal pseudo reference genome (I think the consensus sequence), it makes little sense. e.g. For the GISAID msa file, where the variant D614G is present in almost all sequences, it will be recognised as WT and not as variant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants