Number of ASVs changes on different computers/R versions? #1939

vmsur · 2024-04-28T14:54:26Z

Hi there,
I’m double checking/proofreading a DADA2 workflow script that was initially run in January of 2022 to make sure everything is still working fine.

I am running the script on the same Illumina metabarcoding result files as used in 2022. I’ve also made sure to use the same version of DADA2 (version 1.20) as we used at the time. The only difference is that we are running it on a different computer; and on a different version of R.

The results for each step (errors, derep) is exactly the same between 2022 and 2024, until I reach the Sample Inference step. Then, the number of “real” sequence variants detected in a given sample is different than it was on the 2022 run. By the time we merge and remove chimeras, the final table has a slightly different number of ASVs than before (2,025 ASVs were found in 2022, while on this run there are 2,007). Is it to be expected that the core sequence-variant inference algorithm would produce a slightly different result either because we’re using a different computer or different version of R? (Those are the only two differences I can see between the two runs.)

Thank you!

benjjneb · 2024-04-29T14:28:40Z

Is it to be expected that the core sequence-variant inference algorithm would produce a slightly different result either because we’re using a different computer or different version of R?

No, if the version of DADA2 is the same it should produce identical denoising results.

Only think I can think of is did you use randomize=TRUE in the learnErrors step? If so, that could introduce small differences based on different samples being chosen to learn the error model.

vmsur · 2024-04-30T02:14:38Z

Thanks, Dr. Callahan. You're right that it's due to the learnErrors step. It looks like the 'error in' and 'error out' results are slightly different from the 2022 run, starting at the fifth decimal place. I'm not sure why, as I didn't include randomize=TRUE. The only inputs in the learnErrors command are the filtered data and multithread=TRUE. Would there be anything else that could cause the error rates to change between different runs on the same data? I'll continue poking around either way, thank you!

…

On Mon, Apr 29, 2024 at 7:29 AM Benjamin Callahan ***@***.***> wrote: Is it to be expected that the core sequence-variant inference algorithm would produce a slightly different result either because we’re using a different computer or different version of R? No, if the version of DADA2 is the same it should produce identical denoising results. Only think I can think of is did you use randomize=TRUE in the learnErrors step? If so, that could introduce small differences based on different samples being chosen to learn the error model. — Reply to this email directly, view it on GitHub <#1939 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BIEN53UN5UNCBJUHSO7TPM3Y7ZKK3AVCNFSM6AAAAABG5BNTYOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBSHEYDEMBUHE> . You are receiving this because you authored the thread.Message ID: ***@***.***>

benjjneb · 2024-04-30T18:38:01Z

It is possible that changes within R could cause the difference. In particular, what comes to mind is that learnErrors relies on the stats::loess function within the dada2::loessErrfun function call. Could a minor change in loess have caused that small numerical change in the error model? (and the minor downstream difference in detecting a handful of ASVs)

vmsur · 2024-05-03T00:19:06Z

Ah, perhaps that's it. I may try using older versions of R, and/or loading older versions of some of the dependent packages to see if that changes things. I appreciate your response, thank you!

…

On Tue, Apr 30, 2024 at 11:38 AM Benjamin Callahan ***@***.***> wrote: It is possible that changes within R could cause the difference. In particular, what comes to mind is that learnErrors relies on the stats::loess function within the dada2::loessErrfun function call. Could a minor change in loess have caused that small numerical change in the error model? (and the minor downstream difference in detecting a handful of ASVs) — Reply to this email directly, view it on GitHub <#1939 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BIEN53XCSZNJX37OGIWJDXLY77QJ7AVCNFSM6AAAAABG5BNTYOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBWGUYDENZUGU> . You are receiving this because you authored the thread.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Number of ASVs changes on different computers/R versions? #1939

Number of ASVs changes on different computers/R versions? #1939

vmsur commented Apr 28, 2024

benjjneb commented Apr 29, 2024

vmsur commented Apr 30, 2024 via email

benjjneb commented Apr 30, 2024 •

edited

vmsur commented May 3, 2024 via email

Number of ASVs changes on different computers/R versions? #1939

Number of ASVs changes on different computers/R versions? #1939

Comments

vmsur commented Apr 28, 2024

benjjneb commented Apr 29, 2024

vmsur commented Apr 30, 2024 via email

benjjneb commented Apr 30, 2024 • edited

vmsur commented May 3, 2024 via email

benjjneb commented Apr 30, 2024 •

edited