-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: mutations relative to arbitrary node #1454
base: master
Are you sure you want to change the base?
Conversation
This extends concept of private mutations (relative to the parent node on the ref tree) to mutations relative to an arbitrary node of interest. The ref nodes of interest are described by the user in the `.meta .extensions .nextclade .reference_nodes` of the input Auspice JSON. The description can also contain constrains: we can match node to only query samples belonging to a certain clade or lineage. Private mutations functionality is unchanged, this is only an addition. Though the implementation algo is largely reused. On this commit only nuc mutations are added.
Similarly to b537132, add relative amino acid mutations
This just passes through from js to wasm the data that is now required to output relative mutations
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
ref_nodes | ||
.iter() | ||
.map(|&ref_node| -> Result<_, Report> { | ||
let node = graph | ||
.iter_nodes() | ||
.find(|node| node.payload().name == ref_node.name) | ||
.ok_or_else(|| eyre!("Unable to find reference node on the tree: '{}'", &ref_node.name))?; | ||
|
||
let muts = find_private_nuc_mutations( | ||
node.payload(), | ||
substitutions, | ||
deletions, | ||
missing, | ||
alignment_range, | ||
ref_seq, | ||
non_acgtns, | ||
virus_properties, | ||
); | ||
|
||
Ok(RelativeNucMutations { | ||
ref_node: ref_node.to_owned(), | ||
muts, | ||
}) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is very little new logic, mostly bookkeeping. The find_private_*_mutations()
for nucs and aa are reused as is. The only difference compared to private mutations is that the code now runs multiple times, for each requested node.
This code fragment is for nucs. The sibling function for aa is just below that.
.reference_nodes | ||
.iter() | ||
.filter(|node| { | ||
// For each attribute key in includes, check that the attribute value of this sample match | ||
// at least one item in the include list | ||
node.include.iter().all(|(key, includes)| { | ||
let curr_value = if key == "clade" { clade } else { &clade_node_attrs[key] }; | ||
includes.iter().any(|include_value| include_value == curr_value) // TODO: consider regex match rather than equality | ||
}) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the logic for constraining the mutations calculation by clades and clade-like attributes. If include
field is present, then we lookup the constrained attribute on the query sample and only consider this node if the query attribute's value is matching any of the values in the include list.
For example, if config has node of interest which is only relevant for clades 23A and 23B:
{
"...": "...",
"include": { "clade": ["23A", "23B"] }
}
then mutations relative to this node will be calculated only for query samples of clade 23A and 23B.
Same for pango lineages:
{
"...": "...",
"include": { "Nextclade_pango": ["A.1.2.3", "A.1.2.3.4"] }
}
It is up for discussion how multiple filters (multiple keys in the include
object) should be combined - using boolean OR or boolean AND.
This extends concept of private mutations (private mutations are mutations relative to the parent node on the ref tree) to a more general concept of mutations relative to an arbitrary node of interest.
The ref nodes of interest are described by the user in the
.meta.extensions.nextclade.reference_nodes
of the input Auspice JSON. The description can also contain constraints: we can match node to only query samples belonging to a certain clade or lineage.Private mutations functionality is unchanged. New functionality, inputs and outputs are added on top. Though the implementation algo is largely reused.
Test
PR in data for testing: nextstrain/nextclade_data#198 (branch with the same name). Dataset
nextstrain/sars-cov-2/wuhan-hu-1/proteins
there hasreference_nodes
config added totree.json
. Can be used like this:https://nextclade-git-feat-mutations-relative-to-node-nextstrain.vercel.app/?dataset-server=gh&dataset-name=nextstrain/sars-cov-2/wuhan-hu-1/proteins
Work items
For consideration:
Inputs
Example configuration object. Put it into
.meta
of Auspice JSON (such that it becomes.meta.extensions.nextclade.reference_nodes
)Click to expand
The
name
field should match thename
field of one of the nodes on the tree.The
displayName
anddescription
are optional arbitrary strings used for display purposes.The
include
field should be an object, which contains:name
s from the.meta.extensions.nextclade.clade_node_attrs
(for clade-like attributes) or string"clade"
(for the built-in clades).If the
include
field is not present, then no constraints applied (all query sequences are considered).Outputs
Output JSON and NDJSON
Example fragment of output json entry (entry in the
.results[]
array) (mutation lists are truncated for demonstration purposes)Click to expand
Output TSV and CSV
TODO
Visualization in Nextclade Web
TODO