Differences in chimera detection based on dataset structure #1942

seoldh · 2024-05-01T13:02:36Z

I have samples sequenced targeting the same 16S partial region from two different institutions, with about 120 and 300 samples respectively. I'm unsure how to correct for batch effects, so the first thing I did was try the following three commands to see what difference pooling makes:

qiime dada2 denoise-paired --i-demultiplexed-seqs A_institution.qza (120 samples) --p-trunc-len-f N --p-trunc-len-r M --p-trim-left-f L --p-trim-left-r O --p-pooling-method pseudo --p-chimera-method pooled
qiime dada2 denoise-paired --i-demultiplexed-seqs B_institution.qza (300 samples) --p-trunc-len-f N --p-trunc-len-r M --p-trim-left-f L --p-trim-left-r O --p-pooling-method pseudo --p-chimera-method pooled
qiime dada2 denoise-paired --i-demultiplexed-seqs A+B_institution.qza (420 samples) --p-trunc-len-f N --p-trunc-len-r M --p-trim-left-f L --p-trim-left-r O --p-pooling-method pseudo --p-chimera-method pooled

In (1), chimeras were detected and filtered out, but in (2) and (3) cases, chimeras were not detected at all in any of the samples,
i.e., all samples in A_institution (120samples) >
the count after merging in (1) ≈ the count after merging in (3) = the count after chimera removal (3) > the count after chimera removal in (1).

Are there any parameters I need to adjust for chimera detection when using large dataset? Or could there be other causes?
The sequencing quality plots for raw data from two institutions are similar, but institution A has an average read count of 40,000 while institution B has an average read count of 170,000, a difference of about 4x.

The text was updated successfully, but these errors were encountered:

benjjneb · 2024-05-01T14:19:29Z

The pooled chimera detection method should only be used if using pooled denoising. It should not be used with the default denoising (independent) or with pseudo-pooling. So I would recommend as a first step adjusting the pooling modality.

(it would be good to have clearer documentation on that, or maybe a warning message when pooling method and denoising method are misaligned)

seoldh · 2024-05-02T02:13:15Z

In all three cases, I used the parameter --p-pooling-method pseudo --p-chimera-method pooled
Since qiime only provides two pooling methods, independent and pseudo, I'm going to try to use R to apply pool=TRUE rather than pseudo.

benjjneb mentioned this issue May 1, 2024

Add warning when pooling method and chimera detection method are misaligned qiime2/q2-dada2#165

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Differences in chimera detection based on dataset structure #1942

Differences in chimera detection based on dataset structure #1942

seoldh commented May 1, 2024

benjjneb commented May 1, 2024

seoldh commented May 2, 2024

Differences in chimera detection based on dataset structure #1942

Differences in chimera detection based on dataset structure #1942

Comments

seoldh commented May 1, 2024

benjjneb commented May 1, 2024

seoldh commented May 2, 2024