Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of ASVs changes on different computers/R versions? #1939

Open
vmsur opened this issue Apr 28, 2024 · 4 comments
Open

Number of ASVs changes on different computers/R versions? #1939

vmsur opened this issue Apr 28, 2024 · 4 comments

Comments

@vmsur
Copy link

vmsur commented Apr 28, 2024

Hi there,
I’m double checking/proofreading a DADA2 workflow script that was initially run in January of 2022 to make sure everything is still working fine.

I am running the script on the same Illumina metabarcoding result files as used in 2022. I’ve also made sure to use the same version of DADA2 (version 1.20) as we used at the time. The only difference is that we are running it on a different computer; and on a different version of R.

The results for each step (errors, derep) is exactly the same between 2022 and 2024, until I reach the Sample Inference step. Then, the number of “real” sequence variants detected in a given sample is different than it was on the 2022 run. By the time we merge and remove chimeras, the final table has a slightly different number of ASVs than before (2,025 ASVs were found in 2022, while on this run there are 2,007). Is it to be expected that the core sequence-variant inference algorithm would produce a slightly different result either because we’re using a different computer or different version of R? (Those are the only two differences I can see between the two runs.)

Thank you!

@benjjneb
Copy link
Owner

Is it to be expected that the core sequence-variant inference algorithm would produce a slightly different result either because we’re using a different computer or different version of R?

No, if the version of DADA2 is the same it should produce identical denoising results.

Only think I can think of is did you use randomize=TRUE in the learnErrors step? If so, that could introduce small differences based on different samples being chosen to learn the error model.

@vmsur
Copy link
Author

vmsur commented Apr 30, 2024 via email

@benjjneb
Copy link
Owner

benjjneb commented Apr 30, 2024

It is possible that changes within R could cause the difference. In particular, what comes to mind is that learnErrors relies on the stats::loess function within the dada2::loessErrfun function call. Could a minor change in loess have caused that small numerical change in the error model? (and the minor downstream difference in detecting a handful of ASVs)

@vmsur
Copy link
Author

vmsur commented May 3, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants