Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vcfanno doesn't annotate sites that are polymorphic in query vcf but fixed for reference allele in annotation vcf #152

Open
AaronRuben opened this issue Feb 9, 2023 · 2 comments

Comments

@AaronRuben
Copy link

Hi Brent,

I was trying to annotate 1KGP VCFs with genotype information of archaic hominins (e.g., Altai Neanderthal). These individuals have a lot of sites that are homozygous for the reference allele, for example:

20 60343 . G .

while this site is polymorphic in 1KGP:

20 60343 . G A

These sites match but a currently not annotated unless the --permissive-overlap flag is set, which isn't ideal. I know this is an edge case, and I can't simply merge the VCFs because the inclusion of archaic hominins would mess up downstream steps.

Would be possible to handle such cases in future?

Thanks,
Aaron

@brentp
Copy link
Owner

brentp commented Feb 9, 2023

Hi Aaron, the only way to do this is with --permissive-overlap as you note.
I think that's the correct behavior as "G ." should not match with G A". if the are homozygous reference only, then the more correct would be "G G".

@AaronRuben
Copy link
Author

AaronRuben commented Feb 9, 2023

Hi Brent,

Thanks for the quick response.

If it would be "G G", it would still not match with "G A". I also think "G ." makes more sense, as there is no alternative allele.

In either case, it would be great to allow matches of polymorphic and monomorphic sites (whether denoted by "G ." or "G G") when the reference alleles match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants