Unexpected results after cutadapt #1955

jesika0123 · 2024-05-13T19:56:55Z

I am following the pipeline posted on "https://github.com/ErnakovichLab/dada2_ernakovichlab".

I am working with NovaSeq 2*250bp data.
Primer amplicon length: 347bp
Issue: rbind results (see below)
Solution: although I have coding experience, I am far from an expert, so please keep this in mind when reply. Thanks in advance for all help!

I have defined the primers as 1) FWD <- "18 bp with wobbles" and 2) REV <- "18 bp no wobbles" (sorry I can't share the actual bps).

Next I ran the following:

Function that creates a list of all orientations of the primers

allOrients <- function(primer) {
  # Create all orientations of the input sequence
  require(Biostrings)
  dna <- DNAString(primer)  #The Biostrings works w/ DNAString objects rather than character vectors
  orients <- c(Forward = dna, Complement = complement(dna), Reverse = reverse(dna), 
               RevComp = reverseComplement(dna))
  return(sapply(orients, toString))  #Convert back to character vector
}

# Save the primer orientations to pass to cutadapt
FWD.orients<- allOrients(FWD)
FWD.orients

REV.orients <- allOrients(REV)
REV.orients

Pre-filter to remove sequences with ambiguous bases (Ns)

fnFs.filtN <- file.path(preprocess, "filtN", basename(fnFs))
fnRs.filtN <- file.path(preprocess, "filtN", basename(fnRs))
filterAndTrim(fnFs, fnFs.filtN, fnRs, fnRs.filtN, maxN = 0, multithread = FALSE)

Count the number of times the primers appear in the forward and reverse read, while considering all possible primer orientations.

primerHits <- function(primer, fn) {
# Counts number of reads in which the primer is found
nhits <- vcountPattern(primer, sread(readFastq(fn)), fixed = FALSE)
return(sum(nhits > 0))
}

rbind(FWD.ForwardReads = sapply(FWD.orients, primerHits, fn = fnFs.filtN[[1]]), FWD.ReverseReads = sapply(FWD.orients,
primerHits, fn = fnRs.filtN[[1]]), REV.ForwardReads = sapply(REV.orients, primerHits,
fn = fnFs.filtN[[1]]), REV.ReverseReads = sapply(REV.orients, primerHits, fn = fnRs.filtN[[1]]))

Not the expect results:
Forward Complement Reverse RevComp
FWD.ForwardReads 314219 0 0 8724
FWD.ReverseReads 305418 0 0 8770
REV.ForwardReads 204238 0 0 424
REV.ReverseReads 210007 0 0 458

But...just to humor myself I keep going:

Create directory to hold the output from cutadapt

if (!dir.exists(trimmed)) dir.create(trimmed)
fnFs.cut <- file.path(trimmed, basename(fnFs))
fnRs.cut <- file.path(trimmed, basename(fnRs))

Save the reverse complements of the primers to variables

FWD.RC <- dada2:::rc(FWD)
REV.RC <- dada2:::rc(REV)

Trim FWD and the reverse-complement of REV off of R1 (forward reads)

R1.flags <- paste("-g", FWD, "-a", REV.RC)

Trim REV and the reverse-complement of FWD off of R2 (reverse reads)

R2.flags <- paste("-G", REV, "-A", FWD.RC)

Run Cutadapt

for (i in seq_along(fnFs)) {
system2(cutadapt, args = c(R1.flags, R2.flags, "-m", 20, #based on a suggestion by benjjneb "-n", 2, # -n 2 required to remove FWD and REV from reads
"-o", fnFs.cut[i], "-p", fnRs.cut[i], # output files
fnFs.filtN[i], fnRs.filtN[i])) # input files
}

rbind(FWD.ForwardReads = sapply(FWD.orients, primerHits, fn = fnFs.cut[[1]]),

  FWD.ReverseReads = sapply(FWD.orients, primerHits, fn = fnRs.cut[[1]]),

  REV.ForwardReads = sapply(REV.orients, primerHits, fn = fnFs.cut[[1]]),

  REV.ReverseReads = sapply(REV.orients, primerHits, fn = fnRs.cut[[1]]))

As expected: not the results I want:

                                Forward Complement Reverse RevComp

FWD.ForwardReads 0 0 0 8723
FWD.ReverseReads 305352 0 0 0
REV.ForwardReads 204238 0 0 0
REV.ReverseReads 0 0 0 458

The text was updated successfully, but these errors were encountered:

benjjneb · 2024-05-14T15:17:50Z

Your primer-checking results suggest that you are working with a mixed orientation library. That is, the sequenced reads are coming from an even mixture of those in the expected orientation (FWD primer on R1 and rc(REV) primer on R2), and the opposite orientation (FWD primer on R2 and rc(REV) on R2). Your cutadapt command is only trimming primers when the reads are in the expected orientation.

The easiest solution is to discard untrimmed reads at the cutadapt step. This will throw away half the reads, but also avoid a number of complications downstream. For other approaches, see this previous thread #1357 and there are probably other examples on the forum as well.

jesika0123 · 2024-05-15T19:09:48Z

Thank you for your reply. I have continued to edit the cutadapt parameters and settings, but still cannot get all 0s at the "sanity check" after primer removal.

I have added your suggestion of adding --discard-untrimmed, but also have added other suggested parameters.

Questions:

For the following script (below) primers are still present, but these are the best results so far (a full summary is below). What happens if it is not possible to get to all 0s at this step?

FWD.ForwardReads 0 0 0 0

FWD.ReverseReads 1 0 0 0

REV.ForwardReads 3 0 0 0

REV.ReverseReads 0 0 0 0

Is there a way not to lose a large percentage of reads at this step? Or is just the beast of this type of sequence data?

Best script and summary of results:
for (i in seq_along(fnFs)) {
system2(cutadapt, args = c(R1.flags, R2.flags, "-j", 0, "--nextseq-trim=20", "--match-read-wildcards", "--discard-untrimmed", "-n", 2, "--minimum-length", 214, "-e", 0,
"-o", fnFs.cut[i], "-p", fnRs.cut[i], # output files
fnFs.filtN[i], fnRs.filtN[i])) # input files
}

Summary of testing different parameters

no minimal read length

Forward Complement Reverse RevComp

FWD.ForwardReads 0 0 0 7

FWD.ReverseReads 10 0 0 0

REV.ForwardReads 4 0 0 0

REV.ReverseReads 0 0 0 0

"--minimum-length", 214 (250 - 18 (forward primer) - 18 (reverse primer) vs 230) = same results and best results

Forward Complement Reverse RevComp

FWD.ForwardReads 0 0 0 0

FWD.ReverseReads 1 0 0 0

REV.ForwardReads 3 0 0 0

REV.ReverseReads 0 0 0 0

increasing -n from 2 to 16 did not change results

#Forward Complement Reverse RevComp
#FWD.ForwardReads 0 0 0 7
#FWD.ReverseReads 10 0 0 0
#REV.ForwardReads 4 0 0 0
#REV.ReverseReads 0 0 0 0

-e from 0 to default changed results

#Forward Complement Reverse RevComp
#FWD.ForwardReads 0 0 0 7
#FWD.ReverseReads 12 0 0 0
#REV.ForwardReads 5 0 0 0
#REV.ReverseReads 0 0 0 1

benjjneb · 2024-05-16T15:34:44Z

Don't worry about getting perfect zeros. The goal is to get rid of almost all the primers, and you're achieving that.

A few stray ones show up because there are minor differences in how primers are matched by cutadapt and the R code, and possibly some malformed amplicons where the primers might appear in the middle of the sequences. But these are at very low numbers, so not a concern.

jesika0123 · 2024-05-16T17:50:51Z

Thank you!

jesika0123 closed this as completed May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected results after cutadapt #1955

Unexpected results after cutadapt #1955

jesika0123 commented May 13, 2024

benjjneb commented May 14, 2024

jesika0123 commented May 15, 2024

benjjneb commented May 16, 2024

jesika0123 commented May 16, 2024

Unexpected results after cutadapt #1955

Unexpected results after cutadapt #1955

Comments

jesika0123 commented May 13, 2024

Function that creates a list of all orientations of the primers

Pre-filter to remove sequences with ambiguous bases (Ns)

Count the number of times the primers appear in the forward and reverse read, while considering all possible primer orientations.

Create directory to hold the output from cutadapt

Save the reverse complements of the primers to variables

Trim FWD and the reverse-complement of REV off of R1 (forward reads)

Trim REV and the reverse-complement of FWD off of R2 (reverse reads)

Run Cutadapt

benjjneb commented May 14, 2024

jesika0123 commented May 15, 2024

FWD.ForwardReads 0 0 0 0

FWD.ReverseReads 1 0 0 0

REV.ForwardReads 3 0 0 0

REV.ReverseReads 0 0 0 0

Summary of testing different parameters

no minimal read length

Forward Complement Reverse RevComp

FWD.ForwardReads 0 0 0 7

FWD.ReverseReads 10 0 0 0

REV.ForwardReads 4 0 0 0

REV.ReverseReads 0 0 0 0

"--minimum-length", 214 (250 - 18 (forward primer) - 18 (reverse primer) vs 230) = same results and best results

Forward Complement Reverse RevComp

FWD.ForwardReads 0 0 0 0

FWD.ReverseReads 1 0 0 0

REV.ForwardReads 3 0 0 0

REV.ReverseReads 0 0 0 0

increasing -n from 2 to 16 did not change results

-e from 0 to default changed results

benjjneb commented May 16, 2024

jesika0123 commented May 16, 2024