Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to manage heterozygosity in SNP conversion? #98

Open
Pryfed opened this issue Sep 29, 2020 · 0 comments
Open

How to manage heterozygosity in SNP conversion? #98

Pryfed opened this issue Sep 29, 2020 · 0 comments

Comments

@Pryfed
Copy link

Pryfed commented Sep 29, 2020

Hello,

Sorry for this (I guess) basic question, but I did not find the answer in the README.md file nor in the paper (Page et al. 2016).

I try to convert FASTA alignments into a SNP-extracted VCF format for downstream analyses. Some alignments are for nuclear markers, and I work on a polyploid organism, so I sometimes have more than 2 haplotypes for a given individual, but all are properly phased.

My FASTA input is formated as follow:

Individual1_a
Allele-a-sequence
Individual1_b
Allele-b-sequence
Individual2_a
Allele-a-sequence
Individual2_b
Allele-b-sequence
Individual2_c
Allele-c-sequence
...

I used a basic command:

snp-sites -v -o out.vcf in.fas

And I indeed got a .vcf file. But in this file, each allele seems coded as a homozygous individual, I see no 0/0/1 or even 0/1 in the output as expected, but rather only 0, 1 and 2 (like haploid calls).

How could I get an output so that phasing information and heterozygosity are considered? Is there an option in snp-sites that I missed? Or do I have to adapt my input, and how? (Like, loosing the phasing information by merging the alleles, getting only 1 sequence per individual but with ambiguities?! Is that mandatory?)

Thank you for any answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant