Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract allele frequency data from 1000G VCFs #5

Open
grosscol opened this issue Apr 28, 2023 · 0 comments
Open

Extract allele frequency data from 1000G VCFs #5

grosscol opened this issue Apr 28, 2023 · 0 comments

Comments

@grosscol
Copy link
Collaborator

Create a new workflow for allelic frequency information. The current AF data comes along with the VEP process due to the --af flag.

The allelic frequency information from the VEP output appears to be incomplete. E.g. 1-55063514-G-A should have AF data, but it does not appear to be present.

  1. Download VCFs for 1000G on GRCh38 into reference data storage: https://www.internationalgenome.org/data-portal/data-collection/grch38
  2. Extract AF and *_AF fields. SNV ids are pos-ref-alt
  3. Convert to Mongo's bson format for use with mongoimport.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant