Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New section describing familial PEDIGREE headers #413

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

jmarshall
Copy link
Member

@jmarshall jmarshall commented May 29, 2019

Two separate commits, the first fixing obvious errors and the second an outline of a suggested new “Describing family relationships” section:

  1. Add missing ## and other words; replicate 2016's PR Add missing genome IDs #176's fixes in the other copies of the spec. [Split off as a separate PR, Fix ##PEDIGREE example typographical errors #583.]

  2. Outline of a new “Describing family relationships” section that would address clarifications on vcf pedigree header #381 by explaining how the ##PEDIGREE trio metadata line shown in the VCF spec is supposed to work and how it corresponds to an external PED file.

    Previous discussion has suggested that this functionality is thought to be stillborn. If so, the examples should be removed from the specs. Otherwise if the functionality is used out there, it would be good for the spec to describe the (fairly obvious) way in which it would be intended to be used.

@jmarshall jmarshall added the vcf label May 29, 2019
@hts-specs-bot
Copy link

Changed PDFs as of 82ae462: VCFv4.1 (diff), VCFv4.2 (diff), VCFv4.3 (diff).

@pd3 pd3 self-requested a review June 4, 2019 09:10
VCFv4.3.tex Outdated
@@ -1380,6 +1380,19 @@ \subsection{Representing unspecified alleles and REF-only blocks (gVCF)}
\end{flushleft}
\normalsize

\subsection{Describing family relationships}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to make this part of the PEDIGREE meta entry description. I can also address the TODO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pedigree lines are putatively used in two separate scenarios: clonal relationships and trios/families. There is already a separate section covering the clonal scenario, so it makes some sense to cover the other one in its own separate section too.

Similarly the minimal §1.4.8 META and SAMPLE meta entry description is expanded upon by §5.4.10.

Copy link
Member

@cyenyxe cyenyxe Jun 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only issue I see is that the clonal relationships are described in the context of breakends, previous knowledge about that syntax is needed and, as a result, they are subsections of that one. On the other hand, this would be a completely separate one which would perfectly fit in the PEDIGREE section.

@@ -253,7 +253,7 @@ \subsubsection{Pedigree field format}
##pedigreeDB=URL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the line I'm interested in but I can't comment on the other one. PED does not support more than 2 ancestors, do we want to do it in VCF? I have never seen this used so I don't think a lot of people will miss it if we drop it. That will make trivial to add some examples for trios.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pd3 @lbergelson what should we do about this? Dropping support for more than 2 ancestors would render some files incorrect, but as I said in my previous comment I have never seen that syntax being used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there would be a lot of pushback against dropping multiple ancestor lines because they seem pretty ambiguously defined at the moment and I haven't ever seen one used... That said, we should maybe not be making breaking changes to an existing spec?

Are they intended only for asexual ancestry where each is the parent of the next? Or are they just an unsorted bag of ancestors that could represent any tree of parentage?

i.e does <ID=SampleID,Name_1=Ancestor_1,...,Name_N=Ancestor_N>
imply

SampleID -> Ancestor_1 -> Ancestor 2 

or could it also mean

 Ancestor_1 <- SampleID -> Ancestor_2

I lean towards removing or deprecating it if we don't know exactly what it means and no one seems to be using it.

I'm also not clear on which of these are controlled vocabulary. Is there a specific ontology of relationships that are allowed? Are we allowed to specify something like Sibling in the case where we don't have parent in the vcf or is that handled with dummy trios that point to unique but not present parents ID's

Are you allowed to include only 1 parent or are trios required?

I assume the example would address some of these questions.

@hts-specs-bot
Copy link

Changed PDFs as of c1c1ce4: VCFv4.1 (diff), VCFv4.2 (diff), VCFv4.3 (diff).

@cyenyxe
Copy link
Member

cyenyxe commented Jul 24, 2019

If people are happy with the slight changes to the document structure, I will add an example about how do PEDIGREE entries compare against PED files.

@hts-specs-bot
Copy link

Changed PDFs as of 75c37b5: VCFv4.1 (diff), VCFv4.2 (diff), VCFv4.3 (diff).

@jmmut
Copy link
Contributor

jmmut commented Dec 2, 2019

I added some commits to add a minimal PED to VCF example. If people think that a complete example is preferable than a minimal one, I can add some corner cases.

Also, note that I referenced other sections instead of putting a complete VCF with the phenotypes and genotypes. I think it's better not to repeat details in different sections for the risk of getting outdated, but in this case I can be convinced otherwise.

Finally, the third commit adds syntax to express sibling/twin relationships.

The only thing I'm not sure about is the generic ancestor line <ID=SampleID,Name_1=Ancestor_1,...,Name_N=Ancestor_N>. To me it looks like it means Ancestor_1 <- SampleID -> Ancestor_2 but I agree with deprecating (or at least explaining better) because it's not clear enough.

@hts-specs-bot
Copy link

Changed PDFs as of f1bc982: VCFv4.3 (diff).

@jmarshall
Copy link
Member Author

Rebased now that the first commit (typographical errors in examples) has been split off as PR #583 and merged to master.

@jmarshall jmarshall changed the title Fix PEDIGREE example errors; draft of familial PEDIGREE section New section describing familial PEDIGREE headers Jul 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants