Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quality plot profile refence for filterAndTrim truncLen values #1936

Closed
luigallucci opened this issue Apr 25, 2024 · 5 comments
Closed

Quality plot profile refence for filterAndTrim truncLen values #1936

luigallucci opened this issue Apr 25, 2024 · 5 comments

Comments

@luigallucci
Copy link

Hi!

I'm working on 16S sequences from Illumina 2x300, with targeting regions V3-V4 (341F-785R)...
Forward:
Screenshot 2024-04-25 at 15 47 48
Reverse:
Screenshot 2024-04-25 at 15 48 05

These are the quality profile after cutadapt.

We were trying to target the lenght 443+25 as reference for the value truncLen, with the following set:

out <- filterAndTrim(cutFs, filtpathF, cutRs, filtpathR,
    truncLen = c(240, 230),
    maxEE = c(2,3),
    maxN = 0,
    #truncQ = 2,
    rm.phix = TRUE,
    compress = TRUE,
    verbose = TRUE,
    multithread = 20
  )

but the best percentage retained after this step is 62%...do you have any suggestion how to choose the best value?

@luigallucci
Copy link
Author

I forgot to add for the primers:

image

@benjjneb
Copy link
Owner

Maximally optimizing the number of reads passing filtering is not necessary. What you want is something "good enough" to keep a reasonable fraction of the reads and preserve overlap for later merging. You are probably fine with what you have.

If you want to try to bump up the numbers a bit, after primer removal the expected length of this amplicon is only ~400-420 nts, so you could drop total truncation lengths by 20 and still be fine for merging (although double-check that holds up in the read tracking at the end of the dada2 worfklow).

@luigallucci
Copy link
Author

fine, I got that, thank you!

Anyway, I have another question...

I was dealing with the chimera removal, and my reads stay more or less on the same level, but...ASV instead drop significantly after this step, e.g.

> seqtab.nochim1 <- removeBimeraDenovo(seqtab1, method="consensus", multithread= 20, verbose=TRUE)
Identified 2223 bimeras out of 3201 input sequences.
> seqtab.nochim1 <- removeBimeraDenovo(seqtab1, minFoldParentOverAbundance = 8, method="consensus", multithread= 20, verbose=TRUE)
Identified 1952 bimeras out of 3201 input sequences.

same samples, same process, different argument for the minFoldParentOverAbundance.

Do you have some guess why this is happening?

I'm using cutadapt to remove the primers:

Command line parameters: -g CCTACGGGNGGCWGCAG -a GGATTAGATACCCBDGTAGTC --discard-untrimmed -m 50 -G GACTACHVGGGTATCTAATCC -A CTGCWGCCNCCCGTAGG --discard-untrimmed -m 50 --revcomp -j 20 -n 2 -o /bioinf/home/lgallucc/M190_analysis/A_pool1/Renamed//cutadapt_path/9_R1.fastq.gz -p /bioinf/home/lgallucc/M190_analysis/A_pool1/Renamed//cutadapt_path/9_R2.fastq.gz /bioinf/home/lgallucc/M190_analysis/A_pool1/Renamed//filtN/9_R1.fastq.gz /bioinf/home/lgallucc/M190_analysis/A_pool1/Renamed//filtN/9_R2.fastq.gz

                 Forward Complement Reverse RevComp
FWD.ForwardReads       2          0       0       0
FWD.ReverseReads       0          0       0       0
REV.ForwardReads       0          0       0       0
REV.ReverseReads       0          0       0       0
FWD.ForwardReads       0          0       0       8
FWD.ReverseReads      15          0       0       0
REV.ForwardReads       0          0       0       0
REV.ReverseReads       0          0       0       0
FWD.ForwardReads       1          0       0       2
FWD.ReverseReads       6          0       0       0
REV.ForwardReads       0          0       0       0
REV.ReverseReads       1          0       0       0
FWD.ForwardReads       2          0       0       7
FWD.ReverseReads      15          0       0       0
REV.ForwardReads       0          0       0       0
REV.ReverseReads       1          0       0       0
FWD.ForwardReads       0          0       0       0
FWD.ReverseReads       0          0       0       0
REV.ForwardReads       0          0       0       0

even if, seems that some primers are not removed, I got that increasing the minimum lenght this not-0 value are disappearing, probably because there is a variability in the lenght of the sequences produced by the sequencing center (n.b. this are new NextSeq data, despite the first example that I provided you).

@benjjneb
Copy link
Owner

benjjneb commented May 3, 2024

I was dealing with the chimera removal, and my reads stay more or less on the same level, but...ASV instead drop significantly after this step

This is normal (as mentioned in the dada2 tutorial "Remove chimeras" section).

Chimeras tend to be very diverse, as you can imagine since they could come from combinations of any pair of real ASVs, and with different breakpoints between them, and tend to appear at low abundances since they are most often formed after the first few PCR cycles. So in typical data they can make up a substantial fraction of ASVs, but should be a low fraction of reads.

@luigallucci
Copy link
Author

@benjjneb thank you for your reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants