-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Potential computation in anvi-get-pn-ps-ratio
is incorrect
#2195
Comments
Hi @apcamargo, Thanks for the bug report. I agree that this might be an improvement relative to the current method. RamblingsI'm embarrassed to say that I'm having trouble understanding the code that I wrote so many years ago, but let's start from Lines 2647 to 2722 in 4eb5767
If we implemented the Nei-Gojobori method, it seems like we would need to change the calculation of synonymous and non-synonymous fractions. In the current implementation, every codon allele contributes either to the number of synonymous differences, or the number of non-synonymous differences. The amount it contributes is defined by the allele's frequency. In contrast, the Nei-Gojobori method says that double- or triple-nucleotide differences contribute twice or three times as much to the number of synonymous and non-synonymous differences, and they have the capacity to contribute to both simultaneously. From the paper: Scope of changesFrom what I can tell, implementing the Nei-Gojobori method requires changing how we calculate the number of s- and ns-differences, not how we calculate s- and ns-potentials. I found this worth mentioning because in your post you refer to potentials in a way that is different to how I define potentials in the codebase: Lines 1646 to 1657 in 4eb5767
Do you agree with me on this point? If so, I believe that Lines 2558 to 2606 in 4eb5767
Based on what I understood from reading the paper, we need to implement the following:
UnknownsI don't understand this, but I think it may play an important role in their method (which is definitely missing from anvi'o's current implementation) |
If I understand it correctly, that formula is to account for silent substitutions. It takes the observed distance between two sequences and computes the estimated distance assuming a uniform substitution rate. |
Short description of the problem
When anvi'o computes the potential of a given codon, it does so by evaluating whether mutations in one position (that is, with Hamming distance of 1) generate synonymous or non-synonymous codons. However, the Nei-Gojobori method takes into account the distance between pairs of codons when computing potentials. That is, if you have the
ACC
codon in the reference and aATT
variant, the potential ofACC
is computed by evaluating all possible mutations paths between those two codons.You can find an implementation of the Nei-Gojobori method here.
anvi'o version
System info
anvi'o executed via Docker.
The text was updated successfully, but these errors were encountered: