You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This doesn't directly impact CoVariants as we don't use the Nextclade file directly, but the metadata.tsv that comes after the ncov-ingest workflow. Currently this hasn't changed, but it may change either by just replacing Nextstrain_clade (which we use) with the shortened name, or by doing this and also adding a "legacy" column.
For clarity, we currently compare values in Nextstrain_clade with display_name from clusters.py (containing things like 22F (Omicron))
If a legacy column is added, switching is as simple as just using this new column, with the rest of the code remaining the same. If there isn't one, or we want to be more future-proof, we should ensure we can just use a different entry in clusters.py which has the year-letter name.
We currently have an entry nextstrain_name, but this has been used inconsistently - sometimes with the 'full' name (21L (Omicron)) and sometimes just year-letter (22A).
To help us switch to that option more easily in future, I propose switching now so that all nextstrain_name entries are year-letter.
This should mean that in future, we would need to switch from using display_name in cluster_analysis.py to using nextstrain_name. This shouldn't be too bad but will need checking as it's a little more complex than I thought.
If this is the path we go, here's a small checklist:
Change all nextstrain_name to use year-letter
Adjust cluster_analysis.py to use nextstrain_name instead of display_name - and check it works.
Clearly, all of the above is only relevant to clades we track that are official Nextstrain clades. For those that aren't official (mostly older ones), we use Pango or SNPs, so this is unchanged.
The text was updated successfully, but these errors were encountered:
@emmahodcroft Let's see if this also affects web. Theoretically, web *should* use only build_name in significant places, but there might be some funny effects in case I deviated from that. So please also watch out for strange things in web as you migrate.
I hope you don't need to change build names. If you do, then it will be a journey, because that's how the md files, URLs and other stuff is linked together.
I don't plan to change the build names, as they're used all over.
RE the nextstrain_name -- I'll keep an eye out - I had the same thought. The main reason I am fairly confident is that it turns out a while ago I accidentally got inconsistent about the naming (started using just year-letter) and as far as I can tell I've never noticed any impact of this. This is the main thing that made me confident that we must not be using if anywhere, or I'd have noticed whenever I first started messing it up (probably about a year ago now) or sometime in between.
But agree - cant' be too careful!
I do not expect to change the build_name - totally agree.
The only other thing that might be worth exploring changing is display_name as perhaps we'd like to move to something a bit more flexible (perhaps including the pango in some cases, as Nextstrain is somewhat moving to do?). But I'd want to do a separate scope to check how much this is used.
Nextclade now breaks down Nextstrain clades into year-letter and WHO, and only gives the "old" 'full' name in a new column,
clade_legacy
.Example:
Old:
clade_nextstrain
==22F (Omicron)
New:
clade_nextstrain
==22F
clade_who
==Omicron
clade_legacy
==22F (Omicron)
This doesn't directly impact CoVariants as we don't use the Nextclade file directly, but the
metadata.tsv
that comes after thencov-ingest
workflow. Currently this hasn't changed, but it may change either by just replacingNextstrain_clade
(which we use) with the shortened name, or by doing this and also adding a "legacy" column.For clarity, we currently compare values in
Nextstrain_clade
withdisplay_name
fromclusters.py
(containing things like22F (Omicron)
)If a legacy column is added, switching is as simple as just using this new column, with the rest of the code remaining the same. If there isn't one, or we want to be more future-proof, we should ensure we can just use a different entry in
clusters.py
which has the year-letter name.We currently have an entry
nextstrain_name
, but this has been used inconsistently - sometimes with the 'full' name (21L (Omicron)
) and sometimes just year-letter (22A
).To help us switch to that option more easily in future, I propose switching now so that all
nextstrain_name
entries are year-letter.This should mean that in future, we would need to switch from using
display_name
incluster_analysis.py
to usingnextstrain_name
. This shouldn't be too bad but will need checking as it's a little more complex than I thought.If this is the path we go, here's a small checklist:
nextstrain_name
to use year-lettercluster_analysis.py
to usenextstrain_name
instead ofdisplay_name
- and check it works.Clearly, all of the above is only relevant to clades we track that are official Nextstrain clades. For those that aren't official (mostly older ones), we use Pango or SNPs, so this is unchanged.
The text was updated successfully, but these errors were encountered: