Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different VCF conventions #554

Open
illusional opened this issue Apr 9, 2021 · 1 comment
Open

Different VCF conventions #554

illusional opened this issue Apr 9, 2021 · 1 comment
Labels

Comments

@illusional
Copy link

Is there a document or good summary of the different types / conventions for VCF? For example, VCF, sites-only VCF, gVCF, pVCF, spVCF. EG:

  • A gVCF (Genomic VCF) contains information for every position in the genome - usually by using records that group bands of sites with the same GQ with NON_REF in the ALT block. (Usually has a .g suffix, eg: <sample>.g.vcf)
  • A Sites-only VCF is a VCF with only site-level annotations (no genotype data), ie: only the first 8 columns (May have .sites-only suffix, eg: <sample>.sites-only.vcf)
  • pVCF (Project VCF) - stores genotypes for an entire cohort, in a 2-D matrix of variant sites and study participants.
  • spVCF (Sparse project VCF source), An optimisation of the pVCF to avoid rapid size growth of larger cohorts: document convention for "QC squeezing" in population VCF #527

I'm interested to know if there are more, (if there's a good "bible" for this,) and what interesting conventions there are for each.

Thanks in advance!

@jkbonfield jkbonfield added the vcf label May 25, 2021
@tcezard
Copy link
Contributor

tcezard commented Jun 17, 2021

There isn't such a document at least not on this site. But that might be a good thing to add to the wiki alongside the glossary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Stalled
Development

No branches or pull requests

3 participants