Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Nucleotide and corresponding amino acid mutation occasionally on different branch #876

Open
corneliusroemer opened this issue Mar 1, 2022 · 3 comments
Labels
bug Something isn't working hard problem Requires more work than most issues

Comments

@corneliusroemer
Copy link
Member

Current Behavior

It sometimes happens that nt and the corresponding aa mutation are not on the same branch

Expected behavior

Nucleotide and the corresponding amino acid mutations (if non-synonymous) should always be on the same branch

How to reproduce

Don't have reproducible input files yet, if anyone finds a build where this happens again please add input files (if shareable)

Possible solution

No idea, seems non-trivial as the root of the problem is that amino acids and nucleotides are reconstructed independently by treetime.

If the problem is not solvable, it would be good to explain in this issue what the preconditions are for the problem to show.

Evidence

Note that the ORF3a:78 mutation and the corresponding nt mutation at position 25624 are not on the same branch:
image (13)

@corneliusroemer corneliusroemer added the bug Something isn't working label Mar 1, 2022
@jameshadfield
Copy link
Member

jameshadfield commented Mar 1, 2022

I think this will be a ncov-specific bug, right? For most pipelines we infer ancestral nuc mutations via augur ancestral and then translate those node-by-node via augur translate. However for ncov the second step is switched out for scripts/explicit_translation.py which uses the translations from nextalign/nextclade, and doesn't consider the output of augur ancestral.

@huddlej
Copy link
Contributor

huddlej commented Mar 1, 2022

I think @jameshadfield is right on here about this being an ncov-specific problem. Amino acid mutations come from the translated alignments that happen from nextalign and then ancestral state reconstruction from those translations while the nucleotide mutations come from augur ancestral's inference of ancestral sequences from the nucleotide alignment. It's easy to imagine how one could get different ancestral state reconstructions from these different inputs to TreeTime.

I would transfer this issue to the ncov repo, where we could consider how to fix it in that context.

@corneliusroemer
Copy link
Member Author

I see! So the actual augur translate makes sure there is a link by doing the translation on reconstructed nucs, therefore only reconstructing once, while ncov reconstructs twice, now, and that's where the link is broken.

It's worth noting that I encountered this in ncov-simple builds. But since ncov uses (almost) the same script, the bug should also be presented there, just unnoticed so far.

I agree that transfer makes sense then.

@corneliusroemer corneliusroemer transferred this issue from nextstrain/augur Mar 1, 2022
@corneliusroemer corneliusroemer changed the title BUG: Nucelotide and corresponding amino acid mutation occasionally on different branch BUG: Nucleotide and corresponding amino acid mutation occasionally on different branch Mar 2, 2022
@victorlin victorlin added the hard problem Requires more work than most issues label Mar 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working hard problem Requires more work than most issues
Projects
No open projects
Status: Backlog
Development

No branches or pull requests

4 participants