Quality plot profile refence for filterAndTrim truncLen values #1936

luigallucci · 2024-04-25T13:51:05Z

Hi!

I'm working on 16S sequences from Illumina 2x300, with targeting regions V3-V4 (341F-785R)...
Forward:

Reverse:

These are the quality profile after cutadapt.

We were trying to target the lenght 443+25 as reference for the value truncLen, with the following set:

out <- filterAndTrim(cutFs, filtpathF, cutRs, filtpathR,
    truncLen = c(240, 230),
    maxEE = c(2,3),
    maxN = 0,
    #truncQ = 2,
    rm.phix = TRUE,
    compress = TRUE,
    verbose = TRUE,
    multithread = 20
  )

but the best percentage retained after this step is 62%...do you have any suggestion how to choose the best value?

The text was updated successfully, but these errors were encountered:

luigallucci · 2024-04-25T14:12:18Z

I forgot to add for the primers:

benjjneb · 2024-04-29T16:17:59Z

Maximally optimizing the number of reads passing filtering is not necessary. What you want is something "good enough" to keep a reasonable fraction of the reads and preserve overlap for later merging. You are probably fine with what you have.

If you want to try to bump up the numbers a bit, after primer removal the expected length of this amplicon is only ~400-420 nts, so you could drop total truncation lengths by 20 and still be fine for merging (although double-check that holds up in the read tracking at the end of the dada2 worfklow).

luigallucci · 2024-05-03T08:52:17Z

fine, I got that, thank you!

Anyway, I have another question...

I was dealing with the chimera removal, and my reads stay more or less on the same level, but...ASV instead drop significantly after this step, e.g.

> seqtab.nochim1 <- removeBimeraDenovo(seqtab1, method="consensus", multithread= 20, verbose=TRUE)
Identified 2223 bimeras out of 3201 input sequences.

> seqtab.nochim1 <- removeBimeraDenovo(seqtab1, minFoldParentOverAbundance = 8, method="consensus", multithread= 20, verbose=TRUE)
Identified 1952 bimeras out of 3201 input sequences.

same samples, same process, different argument for the minFoldParentOverAbundance.

Do you have some guess why this is happening?

I'm using cutadapt to remove the primers:

Command line parameters: -g CCTACGGGNGGCWGCAG -a GGATTAGATACCCBDGTAGTC --discard-untrimmed -m 50 -G GACTACHVGGGTATCTAATCC -A CTGCWGCCNCCCGTAGG --discard-untrimmed -m 50 --revcomp -j 20 -n 2 -o /bioinf/home/lgallucc/M190_analysis/A_pool1/Renamed//cutadapt_path/9_R1.fastq.gz -p /bioinf/home/lgallucc/M190_analysis/A_pool1/Renamed//cutadapt_path/9_R2.fastq.gz /bioinf/home/lgallucc/M190_analysis/A_pool1/Renamed//filtN/9_R1.fastq.gz /bioinf/home/lgallucc/M190_analysis/A_pool1/Renamed//filtN/9_R2.fastq.gz

                 Forward Complement Reverse RevComp
FWD.ForwardReads       2          0       0       0
FWD.ReverseReads       0          0       0       0
REV.ForwardReads       0          0       0       0
REV.ReverseReads       0          0       0       0
FWD.ForwardReads       0          0       0       8
FWD.ReverseReads      15          0       0       0
REV.ForwardReads       0          0       0       0
REV.ReverseReads       0          0       0       0
FWD.ForwardReads       1          0       0       2
FWD.ReverseReads       6          0       0       0
REV.ForwardReads       0          0       0       0
REV.ReverseReads       1          0       0       0
FWD.ForwardReads       2          0       0       7
FWD.ReverseReads      15          0       0       0
REV.ForwardReads       0          0       0       0
REV.ReverseReads       1          0       0       0
FWD.ForwardReads       0          0       0       0
FWD.ReverseReads       0          0       0       0
REV.ForwardReads       0          0       0       0

even if, seems that some primers are not removed, I got that increasing the minimum lenght this not-0 value are disappearing, probably because there is a variability in the lenght of the sequences produced by the sequencing center (n.b. this are new NextSeq data, despite the first example that I provided you).

benjjneb · 2024-05-03T13:08:11Z

I was dealing with the chimera removal, and my reads stay more or less on the same level, but...ASV instead drop significantly after this step

This is normal (as mentioned in the dada2 tutorial "Remove chimeras" section).

Chimeras tend to be very diverse, as you can imagine since they could come from combinations of any pair of real ASVs, and with different breakpoints between them, and tend to appear at low abundances since they are most often formed after the first few PCR cycles. So in typical data they can make up a substantial fraction of ASVs, but should be a low fraction of reads.

luigallucci · 2024-05-13T19:59:20Z

@benjjneb thank you for your reply!

luigallucci closed this as completed May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quality plot profile refence for filterAndTrim truncLen values #1936

Quality plot profile refence for filterAndTrim truncLen values #1936

luigallucci commented Apr 25, 2024

luigallucci commented Apr 25, 2024

benjjneb commented Apr 29, 2024

luigallucci commented May 3, 2024

benjjneb commented May 3, 2024

luigallucci commented May 13, 2024

Quality plot profile refence for filterAndTrim truncLen values #1936

Quality plot profile refence for filterAndTrim truncLen values #1936

Comments

luigallucci commented Apr 25, 2024

luigallucci commented Apr 25, 2024

benjjneb commented Apr 29, 2024

luigallucci commented May 3, 2024

benjjneb commented May 3, 2024

luigallucci commented May 13, 2024