-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected results after cutadapt #1955
Comments
Your primer-checking results suggest that you are working with a mixed orientation library. That is, the sequenced reads are coming from an even mixture of those in the expected orientation (FWD primer on R1 and rc(REV) primer on R2), and the opposite orientation (FWD primer on R2 and rc(REV) on R2). Your cutadapt command is only trimming primers when the reads are in the expected orientation. The easiest solution is to discard untrimmed reads at the cutadapt step. This will throw away half the reads, but also avoid a number of complications downstream. For other approaches, see this previous thread #1357 and there are probably other examples on the forum as well. |
Thank you for your reply. I have continued to edit the cutadapt parameters and settings, but still cannot get all 0s at the "sanity check" after primer removal. I have added your suggestion of adding --discard-untrimmed, but also have added other suggested parameters. Questions:
FWD.ForwardReads 0 0 0 0FWD.ReverseReads 1 0 0 0REV.ForwardReads 3 0 0 0REV.ReverseReads 0 0 0 0
Best script and summary of results: Summary of testing different parametersno minimal read lengthForward Complement Reverse RevCompFWD.ForwardReads 0 0 0 7FWD.ReverseReads 10 0 0 0REV.ForwardReads 4 0 0 0REV.ReverseReads 0 0 0 0"--minimum-length", 214 (250 - 18 (forward primer) - 18 (reverse primer) vs 230) = same results and best resultsForward Complement Reverse RevCompFWD.ForwardReads 0 0 0 0FWD.ReverseReads 1 0 0 0REV.ForwardReads 3 0 0 0REV.ReverseReads 0 0 0 0increasing -n from 2 to 16 did not change results#Forward Complement Reverse RevComp -e from 0 to default changed results#Forward Complement Reverse RevComp |
Don't worry about getting perfect zeros. The goal is to get rid of almost all the primers, and you're achieving that. A few stray ones show up because there are minor differences in how primers are matched by cutadapt and the R code, and possibly some malformed amplicons where the primers might appear in the middle of the sequences. But these are at very low numbers, so not a concern. |
Thank you! |
I am following the pipeline posted on "https://github.com/ErnakovichLab/dada2_ernakovichlab".
I am working with NovaSeq 2*250bp data.
Primer amplicon length: 347bp
Issue: rbind results (see below)
Solution: although I have coding experience, I am far from an expert, so please keep this in mind when reply. Thanks in advance for all help!
I have defined the primers as 1) FWD <- "18 bp with wobbles" and 2) REV <- "18 bp no wobbles" (sorry I can't share the actual bps).
Next I ran the following:
Function that creates a list of all orientations of the primers
Pre-filter to remove sequences with ambiguous bases (Ns)
fnFs.filtN <- file.path(preprocess, "filtN", basename(fnFs))
fnRs.filtN <- file.path(preprocess, "filtN", basename(fnRs))
filterAndTrim(fnFs, fnFs.filtN, fnRs, fnRs.filtN, maxN = 0, multithread = FALSE)
Count the number of times the primers appear in the forward and reverse read, while considering all possible primer orientations.
primerHits <- function(primer, fn) {
# Counts number of reads in which the primer is found
nhits <- vcountPattern(primer, sread(readFastq(fn)), fixed = FALSE)
return(sum(nhits > 0))
}
rbind(FWD.ForwardReads = sapply(FWD.orients, primerHits, fn = fnFs.filtN[[1]]), FWD.ReverseReads = sapply(FWD.orients,
primerHits, fn = fnRs.filtN[[1]]), REV.ForwardReads = sapply(REV.orients, primerHits,
fn = fnFs.filtN[[1]]), REV.ReverseReads = sapply(REV.orients, primerHits, fn = fnRs.filtN[[1]]))
Not the expect results:
Forward Complement Reverse RevComp
FWD.ForwardReads 314219 0 0 8724
FWD.ReverseReads 305418 0 0 8770
REV.ForwardReads 204238 0 0 424
REV.ReverseReads 210007 0 0 458
But...just to humor myself I keep going:
Create directory to hold the output from cutadapt
if (!dir.exists(trimmed)) dir.create(trimmed)
fnFs.cut <- file.path(trimmed, basename(fnFs))
fnRs.cut <- file.path(trimmed, basename(fnRs))
Save the reverse complements of the primers to variables
FWD.RC <- dada2:::rc(FWD)
REV.RC <- dada2:::rc(REV)
Trim FWD and the reverse-complement of REV off of R1 (forward reads)
R1.flags <- paste("-g", FWD, "-a", REV.RC)
Trim REV and the reverse-complement of FWD off of R2 (reverse reads)
R2.flags <- paste("-G", REV, "-A", FWD.RC)
Run Cutadapt
for (i in seq_along(fnFs)) {
system2(cutadapt, args = c(R1.flags, R2.flags, "-m", 20, #based on a suggestion by benjjneb "-n", 2, # -n 2 required to remove FWD and REV from reads
"-o", fnFs.cut[i], "-p", fnRs.cut[i], # output files
fnFs.filtN[i], fnRs.filtN[i])) # input files
}
As expected: not the results I want:
FWD.ForwardReads 0 0 0 8723
FWD.ReverseReads 305352 0 0 0
REV.ForwardReads 204238 0 0 0
REV.ReverseReads 0 0 0 458
The text was updated successfully, but these errors were encountered: