New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restrict 6m builds to tips in the last 6m (post-tree-building) #934
base: master
Are you sure you want to change the base?
Conversation
This is cool James! Just to be sure I'm understanding, I'm guessing the idea here is to allow a more focused view of the tips, without having a tree that 'drags' back further into the past than people may be interested in looking? In other words, it allows a 'shortening' of the X-axis time span? (Or is there something grander at play than I'm picking up - very possible!) In the same vein, when you say things like "push the coalescence back" you mean just visually, right? Since you say they're the same tree-building, the actual coalescence stays the same (seems to, across the views, I think), but the visual 'stretching' depends on how deeply branches go (which is definitely impacted by where you set the cut off). Would the proposal be to make this an toggle-able view of the 6m builds, or the 'only' view? I do think it's very cool, but am wondering if people may get the idea that SARS-CoV-2 has split into different... somethings. I'll let the imagination of the headlines run wild in everyone's own head... 馃檭 On the other hand, someone's always willing to misinterpret, and we can't let that stop us trying to get clearer & better visuals! |
Exactly! It's to focus on the within-clade relationships of what's circulating recently, rather than convey the evolutionary history of the different clades. If we showed the entire connected history of those selected tips then (a) the root node would often be so far back that we don't gain that much horizontal space and (b) it won't work for recombinants. So I'm partitioning the data (via |
5b03419
to
49bc580
Compare
Force-pushed & rerunning (GitHub Action) as our docker image's python doesn't like walruses! Update: This failed at a downstream rule 馃う
Update again: https://github.com/nextstrain/ncov/actions/runs/2243157880 triggered after 9029cea |
Thanks James! That makes sense and I really like this idea. One thought - though may be beyond the scope of this PR - is whether this is something that would be desirable/possible to introduce on Auspice end rather than build end. Like another version of the date slider, where instead of a 'vertical' cutoff (all being along the same line) it was a 'tip' cutoff (like the above) where it then only shows back to the coalescent point of tips in the specified period. And instead of 'greying out' the branches prior to this, do away with them completely and rescale the X-axis. This would let users view or not view the deeper tree, or set their limit to 3m, 6m, etc (of course, there could be a default, too!), rather than 'losing' the information entirely. Could also then be applied to a tree without additional running! |
4723ab1
to
7bbc46e
Compare
9029cea
to
d981566
Compare
See script for comments / methods, as well as added configuration documentation.
Time-restricted builds (as added in the previous commit) may store trees as an array and thus the `fix-colorings.py` script needed to be adjusted accordingly.
d981566
to
0b38fa5
Compare
For sure, we want something like this (for multiple reasons) and I've sketched out a few ideas but nothing seems quite right at the moment in terms of UI. Note that the coalescent is often further back than one might think (e.g. the coalescent for tips in the last 6 months of nCoV often goes back to early 2020, depending on the subsampling). I'm interested in pushing this subtree idea because as we start to sample |
Draft commit to prune outgroup as (near) final step in workflow borrowing from @jameshadfield's earlier work on #934 DO NOT MERGE
This introduces a script which restricts the 6m builds to include tips from the previous 6 months only. This is done after tree building, so that tree reconstruction / inference can use data from the entire pandemic. After tips are restricted we then partition the tips based on clades and visualise them as subtrees. If a clade splits after the cutoff (i.e. in the last 6 months) then it's not drawn as a separate subtree.
I went through a number of iterations of this approach, and there are lots of judgement calls to be made. Very open to modifications / changes / comments 馃槃
Specifically, most of my test datasets had some recent tips of an old clade (e.g 20A, 20C) and the coalescence of those tips was a long time ago. This pushed the root of at least one subtree quite far back. Even just a few delta strains will typically push the coalescence back to around the start of 2021.
For instance, here's the global 6m build unrestricted (left), restricted to 6M (middle) and 3M (right). Notice the difference in minimum date due to tips from 20B not appearing in the last 3 months:
Here's the same restrictions but for the build from #933. Note how BA4/5 aren't split into separate subtrees as they branch off after the cutoff date. (I played around with using different cutoffs here, the further back you push this the more structure comes through.)
And for good measure, here's the h1n1pdm tree focused on the past 6 months (left) and restricted to the last 6 months (right) (colours are different because they aren't defined in the dataset, they're created by auspice):
Trial builds running now: https://github.com/nextstrain/ncov/actions/runs/2237163006