Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differences in chimera detection based on dataset structure #1942

Open
seoldh opened this issue May 1, 2024 · 2 comments
Open

Differences in chimera detection based on dataset structure #1942

seoldh opened this issue May 1, 2024 · 2 comments

Comments

@seoldh
Copy link

seoldh commented May 1, 2024

I have samples sequenced targeting the same 16S partial region from two different institutions, with about 120 and 300 samples respectively. I'm unsure how to correct for batch effects, so the first thing I did was try the following three commands to see what difference pooling makes:

  1. qiime dada2 denoise-paired --i-demultiplexed-seqs A_institution.qza (120 samples) --p-trunc-len-f N --p-trunc-len-r M --p-trim-left-f L --p-trim-left-r O --p-pooling-method pseudo --p-chimera-method pooled
  2. qiime dada2 denoise-paired --i-demultiplexed-seqs B_institution.qza (300 samples) --p-trunc-len-f N --p-trunc-len-r M --p-trim-left-f L --p-trim-left-r O --p-pooling-method pseudo --p-chimera-method pooled
  3. qiime dada2 denoise-paired --i-demultiplexed-seqs A+B_institution.qza (420 samples) --p-trunc-len-f N --p-trunc-len-r M --p-trim-left-f L --p-trim-left-r O --p-pooling-method pseudo --p-chimera-method pooled

In (1), chimeras were detected and filtered out, but in (2) and (3) cases, chimeras were not detected at all in any of the samples,
i.e., all samples in A_institution (120samples) >
the count after merging in (1) ≈ the count after merging in (3) = the count after chimera removal (3) > the count after chimera removal in (1).
image

Are there any parameters I need to adjust for chimera detection when using large dataset? Or could there be other causes?
The sequencing quality plots for raw data from two institutions are similar, but institution A has an average read count of 40,000 while institution B has an average read count of 170,000, a difference of about 4x.

@benjjneb
Copy link
Owner

benjjneb commented May 1, 2024

The pooled chimera detection method should only be used if using pooled denoising. It should not be used with the default denoising (independent) or with pseudo-pooling. So I would recommend as a first step adjusting the pooling modality.

(it would be good to have clearer documentation on that, or maybe a warning message when pooling method and denoising method are misaligned)

@seoldh
Copy link
Author

seoldh commented May 2, 2024

In all three cases, I used the parameter --p-pooling-method pseudo --p-chimera-method pooled
Since qiime only provides two pooling methods, independent and pseudo, I'm going to try to use R to apply pool=TRUE rather than pseudo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants