New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different results in command line and web #1684
Comments
Dear @LalicJ, This is quite unusual, since the log likelihood is different for all models including the baseline Using different trees can most decidedly affect the result of a selection analysis; since you are only using a single foreground branch, the analysis could be quite unstable (small sample size) I pulled your alignment file from Datamonkey and ran BUSTED locally, using a NJ tree, and using an ML tree built by The branch you are testing is rather short (~0.001, so effectively zero). For such a short branch there is no power to detect anything at all, so the expected result is no selection ML tree NJ tree (quite different!) Because the alignment is small, you can also use
OK, so what happened with the Datamonkey result? Looking at the result page, all of the signal comes from a single codon, and a single substitution (V → I) However, if all you do is change the tree a little bit (like this NJ tree which HyPhy builds if you provide This highlights the inherent fragility of these "single-branch" analyses. HTH, |
Thank you very much for your answer!!! The data set I used above is just an attempt to replicate the results of someone else's hyphy busted analysis. If, as you say, I actually have a very large data set (1000 + sequences but very close evolutionary distance), what is the recommended method to build the tree, and do the --rate and --starting-points parameters need to be changed? (Same to detect "single-branch") |
Dear @LalicJ, Just to confirm : are you looking to detect selection on a single (predefined) branch using a large alignment of closely related species? Best, |
Actually, I want to detect selection on a single (predefined) branch using a large alignment of closely related virus sequences. As a rule of thumb, a virus sequence with a positive selection signal is likely to perform better in some ways. |
Dear @LalicJ, Great. If you have a single predefined branch, then I would suggest the following.
For example, if you wish to study the evolution of the highlighted BLUE branch in this (466 sequences) IAV tree (this is an example file https://github.com/veg/hyphy/blob/master/tests/data/IAV-human-H1N1-HA.nex), then the relatively distant sequences from a different clade (red box) are unlikely to have any noticeable impact on what the ω estimate on the blue branch is. I would suggest trimming the analysis to just the neighborhood of the node, and the path from the node to the root. Best, |
Thank you! I got it! I had also been struggling with the long detection time of large data sets. According to your suggestion, I will divide the data set and rebuild the tree for analysis. Thanks again for your help:). |
Dear @LalicJ, Please let me know how it goes. I have found Best, |
Thanks for your advice, I will use |
Stale issue message |
Hi, I met different results when I run busted analysis with command line and web(datamonkey.org).
My command line has the same parameters with web as follow :
hyphy busted --alignment pal2nal_gapout_NHP_AAV.fasta --tree AAVrh85.treefile --branches Foreground --kill-zero-lengths No --error-sink No CPU=100 > hyphy_busted_85_new.out
.The analysis logs of the command and web versions are as follows:
hyphy_busted_85_new.txt
log.txt
The p value is less than 0.05 in the web version, but not in the command version. The only difference I can think of is that in the command version I entered a tree that I built myself. Do you have any suggestions? I'm really racking my brain to know why this is. Any help would be greatly appreciated. :)@spond
The text was updated successfully, but these errors were encountered: