New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convergence Issues In FEL and MEME #1687
Comments
Dear @gykoh, When you specify Parametric bootstrap is inherently stochastic, so if you run multiple analyses on the same data, some variation is expected. This variation should be fairly minor, so something like
Site X run 1 p-value 0.09 is OK, but Site X run 1 p-value 0.002 is not. Which one are you seeing? Best, |
Dear @spond, I have the second cases for the two sites below. Below is a list of the p-values for the first 5 runs at site 86 (positive selection) at bootstrap of 1000 replicates/site in FEL at p = 0.1:
At bootstrap of 0 replicate/site in FEL, site 86 (positive selection) had a p-value of 0.0717 for all of my runs. Below is a list of the p-values for the first 5 runs at site 252 (positive selection) at bootstrap of 1000 replicates/site in FEL at p = 0.1:
At bootstrap of 0 replicate/site in FEL, site 252 (positive selection) had a p-value of 0.0785 for all of my runs. I see the first case for this site. Below is a list of the p-values for the first 5 runs at site 64 (positive selection) at bootstrap of 1000 replicates/site in FEL:
At bootstrap of 0 replicate/site in FEL, site 64 (positive selection) had a p-value of 0.0237 for all of my runs. The p-values do not vary significantly in some runs. However, how do I know which sites to focus on after running FEL and MEME? Some sites showed up in a few runs but not in all of my runs. For instance, site 173 (positive selection) showed up in only two out of my twenty-five runs with both obtained p-values at p = 0.0999 in FEL. How does one decide which sites to focus on, especially if the p-values are varying to a certain extent? What would you recommend finding which sites have strong evidence for positive selection? Thank you! |
Dear @gykoh, This level of resampling variation (with bootstrap) is perfectly normal. Even in cases when some runs are >0.1, these values are probably close to 0.1. All of these cases are "marginal", meaning that they may or may not be considered significant, depending on your desired sensitivity/specificity tradeoffs. With no bootstrap enabled, you will have the same p-value because there's resampling (randomness involved). Can you give me some background on what you are using FEL for? Best, |
Dear @spond, Thank you for the explanation on marginal sites! We are using FEL to look for high dN/dS in genes that code for some cell membrane proteins and using this in understanding the selective pressures in terms of their functions in cells. Ideally, of course, we are looking for a trade off between low false positives and high true positives. But being confident about the sites we “trust” is more important than having all the true positives. Thank you for your help! |
Dear @gykoh, Two further questions
Best, |
Dear @spond, We are running 29 species for a gene that is about 318 codons long. We have run both MEME and FEL multiple times. Like FEL, at bootstrap of 0 replicate/site, we had convergence in all of our runs. However, at bootstrap of 1000 replicate/site, not all the runs outputted the same sites. Question 1: Now, we know that some of these p-value differences between runs are marginal, why are there sites that only show up in all of my runs in ONLY either bootstrap 0 replicates/site or 1000 replicates/site for both MEME and FEL? Below, I provide some cases I encountered in MEME and FEL with sites that were detected to be under positive selection at p = 0.1: MEME
Settings We Used To Run Meme:
FEL
Settings We Used To Run FEL:
Question 2: Another question we have is if the intersection of sites found from both FEL and MEME is better (in terms of avoiding false positives) than just running MEME or just FEL? For instance, in all of my runs both at bootstrap 0 and 1000 replicates/site at p = 0.1 under the settings listed above, FEL and MEME both report sites 64 and 70 to be under positive selection. Thank you! |
Dear @gykoh, Sorry for the delay responding. There's actually a bug in the code (several recent versions, which affects the performanceof Best, |
Stale issue message |
Hello!
I wanted to provide an update with the convergence issues in FEL I encountered. I did talk about it last year in this link here: #1618
I used HYPHY 2.5.51(MP) for Darwin on arm64.
What I did was run FEL 25 times at p = 0.1 under these conditions below (some changes in conditions I ran compared to what I did last year):
I also ran FEL 25 times at p = 0.1 under the same conditions above except the parametric bootstrap resampling was set to 0.
I found that there was a convergence issue in my 25 FEL runs at 1000 replicates/site (each run had slightly different sites).
For my 25 runs at 0 replicates/site, I had no convergence issue. Each run outputted the same codon sites.
What I have done was found codon sites that showed up in all of my 25 FEL runs at 1000 replicates/site and then compared to the codon sites that showed up in my FEL runs at 0 replicates/site.
I noticed the same issue with MEME so I did the similar procedure of comparing codon sites that showed up in all of my 25 MEME runs at 1000 replicates/site to the codon sites that showed up in my MEME runs at 0 replicates/site. Just like FEL, for all 25 MEME runs at 0 replicates/site, I had no convergence issue. Each run outputted the same codon sites.
MEME at p = 0.1 under these conditions:
Questions
a. What is the recommended approach for obtaining convergence in results for both FEL and MEME?
b. Can it work to consider the sites that show up in all of the runs at both 1000 and 0 replicates/site?
c. Or is there something else to try in choosing reliable set of sites with strong evidence for dN/dS > 1?
Thank you!
The text was updated successfully, but these errors were encountered: