Skip to content

The Search for Hybrids: An Open Challenge in Bioinformatics

Hawken Rives edited this page Jun 27, 2018 · 1 revision

For a long time, species were thought to be immutable. A cat was a cat and that was that. It began to dawn on us that organisms change over timescales much greater than our lifetimes. They change enough that they're not even recognizable as the same species anymore! And thus evolutionary theory emerged.

If species can evolve into different species, how do we delineate species? At what point does the ancestral tiger become a cat? This is an open question, and is further complicated by the discovery of hybrids.

Quacks Like a Duck?

If it quacks like a duck, looks like a duck, smells like a duck, then it must be a duck!

Until you sequence its genome and it turns out this duck is more closely related to a family of geese than to its own species! That is to say, if you found this duck's DNA at the scene of a crime, the experts would conclude that the perpetrator was a most likely a nefarious goose.

Since this duck (as an individual) is genetically closer to a different species than to its own, we call it a hybrid.

Are Hybrids Common?

We don't know! These mysterious individuals have been studied in a few isolated cases, but there has never been a large systematic study conducted to find them. With the proliferation of genetic data thanks to sources like GenBank, we have the tools to answer this question now.

What we really hope to answer is whether hybrids are somehow a crucial part of the evolutionary process. Are hybrids the beginning of a new species? It is currently believed that hybrids are rare, but if it turns out they're very common, that might very well support this theory and open up even more mysteries to explore.

You are on the edge of human knowledge. The questions you will answer, and what you will learn are things no one in our history has ever known.

I hope you're excited.

An Open Challenge

So here's the challenge. Given a phylogenetic tree, design an algorithm that will return a list of all hybrid individuals.

The input is given as a JSON tree. Here is a simple example with no hybrids.

{
 "branchset": [{
  "name": "",
  "branchset": [{
     "name": "Canis_lupus__KC346426",
     "length": 0.002930837
    }, {
     "name": "Canis_lupus__KC346425",
     "length": 0.002930837
    }, {
     "name": "Canis_lupus__KC346424",
     "length": 0.002930837
    }, {
     "name": "Canis_lupus__KC346423",
     "length": 0.002930837
    }, {
     "name": "Canis_lupus__KC346421",
     "length": 0.002930837
    }, {
     "name": "Canis_lupus__KC346419",
     "length": 0.002930837
    }],
  "length": 0.1729114
 }, {
  "name": "",
  "branchset": [{
     "name": "Felis_catus__KU253483",
     "length": 0.0006123714
    }, {
     "name": "Felis_catus__KU253482",
     "length": 0.0006123714
    }, {
     "name": "Felis_catus__AB194817",
     "length": 0.0006123713
    }],
  "length": 0.1730313
 }],
 "name": ""
}

When visualized in hybsearch, it looks like this.

no hybrids

Case 1 - Simple Hybrid

Here is a made up example of a simple hybrid. I modified the above tree by hand to insert one of the Felis catus into the Canis lupus subtree.

{
 "branchset": [{
  "name": "",
  "branchset": [{
     "name": "Canis_lupus__KC346426",
     "length": 0.002930837
    }, {
     "name": "Canis_lupus__KC346425",
     "length": 0.002930837
    }, {
     "name": "Felis_catus__KU253483",
     "length": 0.002930837
    }, {
     "name": "Canis_lupus__KC346423",
     "length": 0.002930837
    }, {
     "name": "Canis_lupus__KC346421",
     "length": 0.002930837
    }, {
     "name": "Canis_lupus__KC346419",
     "length": 0.002930837
    }],
  "length": 0.1729114
 }, {
  "name": "",
  "branchset": [{
     "name": "Felis_catus__KU253482",
     "length": 0.0006123714
    }, {
     "name": "Felis_catus__AB194817",
     "length": 0.0006123713
    }],
  "length": 0.1730313
 }],
 "name": ""
}

The hybrid is marked in red below.

one hybrid

Case 2 - Reciprocal Hybrids

It is often the case that there is a reciprocal relationship. Here is an example of that. Notice that there is one hybrid lupus (KC346426) and one hybrid catus (KU253483).

{
 "branchset": [{
  "name": "",
  "branchset": [{
     "name": "Canis_lupus__KC346425",
     "length": 0.002930837
    }, {
     "name": "Felis_catus__KU253483",
     "length": 0.002930837
    }, {
     "name": "Canis_lupus__KC346423",
     "length": 0.002930837
    }, {
     "name": "Canis_lupus__KC346421",
     "length": 0.002930837
    }, {
     "name": "Canis_lupus__KC346419",
     "length": 0.002930837
    }],
  "length": 0.1729114
 }, {
  "name": "",
  "branchset": [{
     "name": "Canis_lupus__KC346426",
     "length": 0.002930837
    },{
     "name": "Felis_catus__KU253482",
     "length": 0.0006123714
    }, {
     "name": "Felis_catus__AB194817",
     "length": 0.0006123713
    }],
  "length": 0.1730313
 }],
 "name": ""
}

Case 3 - Only Two individuals

In some cases, we only have two individuals in a species. If they are not together in the same subtree, who do you consider as the hybrid?

More General Cases?

These certainly are not the only ones. How do we know we know it will always be correct? Randomly generate trees to test against? Figure out a proof of correctness?

Our Current Implementation

See the recursiveSearch function in ent.js for our current algorithm.

It works as follows:

For each branching point in the tree, do the following:

  1. Collect all individuals in the 1st subtree into list A, and in the 2nd subtree into list B
    • The same steps are eventually re-applied with A and B switched so it doesn't matter which one you pick first.
    • If this branching point contains more than 2 subbranches, apply these steps to every pair among the subbranches
  2. Take an individual in A. Call this individual X. If the species of X is found in B, and not everything in B is of species X, then declare X a nonmonophyletic individual!
  3. Everything in B that is not species X is now marked as nonmonophyletic, and paired with the individual X.

Submission

If you have an algorithm that you can demonstrate is correct, please get in touch with freedber@stolaf.edu. We'd love to collaborate! Our pipeline is ready and we'd love to try and run it over thousands of species.