Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support values pertain to the branching event, and thus the node, not a branch #16

Open
hlapp opened this issue Feb 16, 2013 · 4 comments

Comments

@hlapp
Copy link
Contributor

hlapp commented Feb 16, 2013

The MIAPA ontology declares the cdao:has_Support_Value dataproperty as a subproperty of miapa:has_edge_length. However, bootstrap values, posterior probabilities and other support values are about the branching event, which is represented by the interior node, and not about the length of a branch.

As an aside, this also means that there needs to be a separate object property from miapa:edgeLength for capturing support values through object properties.

@arlin
Copy link
Contributor

arlin commented Apr 4, 2013

The solution to this isn't simple, unfortunately. It actually raises some complicated issues.

A branch length might have a support value, but most support values relate to the topology not the branch length.

So, absolutely, you are right that support should not be a sub-property of the branch length (edge_length).

However, support values computed by most methods are not properly a property of nodes.

Most methods generate unrooted trees, and the resampled bootstrap distribution (or the posterior distribution) is a set of trees that all have the same set of taxa T. Support values are most commonly based on splits--- a partition of the set of taxa T into subsets T1 and T2 where union = T, and intersection = null. Every tree in the bootstrap distribution (or the posterior distribution, for Bayesian methods) either has this split, or doesn't. A support value of 95 % means that in 95% of trees, there is a branch such that T1 is on one side and T2 is on the other. Thus, the vast majority of support values in the wild are based on partitions or splits, whether they are bootstrap values or posterior probabilities.

The consensus tree may have a branch connecting 2 nodes such that all the T1 taxa are on one side, and all the T2 taxa are on the other. And so we could call one of the 2 nodes T1 and the other T2, although this is a bit confusing given that only one of these will be a clade in a rooted tree (unless the root happens to fall on the branch in question).

But the bootstrap value is not the bootstrap for T1 nor for T2-- it is a bootstrap value for the partition of taxa.

The way to distinguish these is to imagine some other methods that aren't commonly used. Imagine a method that infers rooted trees because it uses a time-asymmetric model of character change. Such a method could be used to infer support values for clades as distinct from splits. Both ((A,B),C,D) and (A,B,(C,D)) support the split {A,B} vs. { C,D}, but only the first supports the clade (A,B). In this case, the logical place to record the support value for clade (A,B) would be on the parent node.

Are support values a property of the branch? The branch is the place where we write the bootstrap value, because it has a clear conceptual mapping to the T1-T2 split, according to most methods. However, a branch on a tree is not the same thing as a partition of taxa. To understand this, it helps to keep in mind that most of the support values generated by the procedure to generate support values (from a distribution of trees) do not map to the consensus tree. We have no place to write the support value because the split, while real, does not map to the tree topology.

In the big picture, why do we get all this confusion? Its because we are building a map of knowledge being guided by a graphical convention-- how people create a compacted visual representation using line drawings-- as a way to build a map of knowledge. People draw unrooted trees as though they were rooted, and they write bootstrap partition values above the nodes. So we think that bootstrap values are properties of nodes. Upon further thought, it seems that they are properties of branches. But they really aren't properties of branches, either. Its just that the concept of a branch usually maps to the concept of a split used to compute the support value.

Arlin

On Feb 15, 2013, at 10:47 PM, Hilmar Lapp wrote:

The MIAPA ontology declares the cdao:has_Support_Value dataproperty as a subproperty of miapa:has_edge_length. However, bootstrap values, posterior probabilities and other support values are about the branching event, which is represented by the interior node, and not about the length of a branch.

As an aside, this also means that there needs to be a separate object property from miapa:edgeLength for capturing support values through object properties.


Reply to this email directly or view it on GitHub.


Arlin Stoltzfus (arlin@umd.edu)
Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST
IBBR, 9600 Gudelsky Drive, Rockville, MD, 20850
tel: 240 314 6208; web: www.molevol.org

@hlapp
Copy link
Contributor Author

hlapp commented Apr 5, 2013

@arlin I was understanding a node in a tree as the "information artifact" that represents the split between the descendants of its outgoing branches. That's why I was suggesting that support values that indicate support for a certain split as represented by a node are a property of that node.

Are you saying that's not an accurate understanding of what a node is, or at least not one compatible with the definition of cdao:Node? Unfortunately, cdao:Node has no definition, nor does the EvoInfo Concept Glossary, which defines only Terminal Node, by stating as the necessary and sufficient condition that is has no children. As such, if a terminal node is a node that doesn't have descending lineages, this may be incompatible with defining nodes as splits between descendant lineages.

So if it's not a property of a node, of what then? cdao:Annotation currently has 5 subclasses, for annotations of character state matrices, edges, models, OTUs, and trees. Which CDAO class would the object that is the subject of a support_value property be an instance of?

@arlin
Copy link
Contributor

arlin commented Apr 6, 2013

On Apr 5, 2013, at 6:37 PM, Hilmar Lapp wrote:

@arlin I was understanding a node in a tree as the "information artifact" that represents the split between the descendants of its outgoing branches. That's why I was suggesting that support values that indicate support for a certain split as represented by a node are a property of that node.

The lack of a 1:1 mapping between node and split indicates that nodes don't represent splits.

Consider a tree of A, B, C and D, and label internal nodes with i:

( ( A, B ) iAB, ( C, D ) iCD )

the split of set { A, B } vs set { C, D } does not uniquely belong to either internal node iAB or to internal node iCD. It belongs to both, or to neither, because this is an unrooted tree (most trees in the wild are derivationally unrooted, even though people draw them and mark them up as if they were rooted). Using unrooted methods for computing support values, there cannot conceivably be a separate support value for iAB and iCD because they represent the same split. So, that is redundant information. Annotating both nodes with the same datum would violate data normalization.

This is why I say that the support value normally belongs to the branch between iAB and iCD, not to one or both of the nodes.

If we were using methods for rooted trees, then it would be possible to have a support value for the clade AB, and this value could be different from the support value for clade CD. In that case, we would want to assign the support value for clade AB to node iAB.

Does that make sense?

Arlin

Are you saying that's not an accurate understanding of what a node is, or at least not one compatible with the definition of cdao:Node? Unfortunately, cdao:Node has no definition, nor does the EvoInfo Concept Glossary, which defines only Terminal Node, by stating as the necessary and sufficient condition that is has no children. As such, if a terminal node is a node that doesn't have descending lineages, this may be incompatible with defining nodes as splits between descendant lineages.

So if it's not a property of a node, of what then? cdao:Annotation currently has 5 subclasses, for annotations of character state matrices, edges, models, OTUs, and trees. Which CDAO class would the object that is the subject of a support_value property be an instance of?


Reply to this email directly or view it on GitHub.


Arlin Stoltzfus (arlin@umd.edu)
Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST
IBBR, 9600 Gudelsky Drive, Rockville, MD, 20850
tel: 240 314 6208; web: www.molevol.org

@hlapp
Copy link
Contributor Author

hlapp commented Apr 7, 2013

This does make sense. If we inserted an artificial node into the edge between iAB and iCD, would then each internal node represent a split, though? I.e., is there a single exception to the 1:1 mapping?

And aren't support values normally printed next to nodes? In the case of unrooted trees, where does one usually print the support for the one split for which there isn't a node?

Should we introduce the concept of a split into CDAO, with internal nodes being a type of splits, and support values are then annotations of splits?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants