Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loosing reads after merging #1928

Open
hellewiedunkle opened this issue Apr 12, 2024 · 5 comments
Open

Loosing reads after merging #1928

hellewiedunkle opened this issue Apr 12, 2024 · 5 comments

Comments

@hellewiedunkle
Copy link

hellewiedunkle commented Apr 12, 2024

Hello!

I want to use dada2 for creating an ASV table of my 16S V3-V4 sequencing data. Sequencing was performed with Illumina PE250.
The quality profiles of my sequences look (suspiciously) good so I truncated my reads only by a few nts at the ends.

However during merging I always loose about half of my reads. And by always I mean: I tried truncating shorter, longer, I adjusted the ee, I adjusted the maxmismatch and minoverlap. I tried also not truncating at all. Still always about half of my reads are lost at merging.

input filtered denoisedF denoisedR merged nonchim

G12CB3.1 202420   193452    191447    191555 119718  114827
G12CP2.2 204188   194160    193195    193019  80558   77627
G12CP4.3 215010   203014    202915    202909  80327   80325
G12NP4.1 206177   196313    195784    195741 104359  101796
G12NP4.3 202944   193534    193214    193114  93879   91266
G15CB3.1 204004   195434    195178    195167 157006  156426

I have the suspicion that in general my reads dont have a high overlap because I looked at some as an example and my overlap was between 8 and 40nts ...

Can someone maybe offer some assistance? What can I do?

QualityScoreForward.pdf

QualityScoreReverse.pdf

@hellewiedunkle hellewiedunkle changed the title Problem with loosing reads after merging Loosing reads after merging Apr 12, 2024
@benjjneb
Copy link
Owner

What is are the read lengths you are generating? 2x250? And how were reads pre-processed prior to dada2?

@hellewiedunkle
Copy link
Author

hellewiedunkle commented Apr 16, 2024

Hey there! Thanks for the quick response. Yes sequencing generated lenghts of 250bp paired-end raw reads and they were pre-processed by removing barcodes and primers.

@benjjneb
Copy link
Owner

Assuming that you are using the "Illumina" v3v4 protocol, the sequenced amplicons are ~440-460nts long (there is a bimodal length distro in the V3V4 region), and include the primers at the start of the reads, but do not have barcodes on the R1/R2 reads. Since you are using 2x250, that is only 500nts of total sequencing for each read, thus a very short overlap region. What you are seeing in your data is probably the loss of the longer V3V4 mode from a failuer to merge due to insufficient overlap.

To simplify things, I would suggest dropping your pre-processing step, and relyinng on filterAndTrim(..., trimLeft=c(FWD_PRIM_LEN, REV_PRIM_LEN) to remove the primers. Then choose truncLen=c(FWD_TRUNC_LEN, REV_TRUNC_LEN) such that the sum of the two truncation lengths is at least 480nts, to get ~20 nts of overlap even for the longer amplicons.

@hellewiedunkle
Copy link
Author

This did not work unfortunately...

out <- filterAndTrim(fnFs, filtFs, fnRs, filtRs,trimLeft=c(17, 20), truncLen=c(245,245), maxN=0, maxEE=c(2,2), truncQ=2, rm.phix=TRUE, compress=TRUE, multithread=TRUE)

      input filtered denoisedF denoisedR merged nonchim

G12CB2.1 203435 195575 179002 183177 99666 75499
G12CB3.3 202318 193009 191095 192119 95683 89147
G12CP1.2 206563 196586 194908 195628 89656 76863
G12CP3.1 203161 192876 191362 192537 94567 81344
G12NB1.1 203714 193078 191882 192818 56216 54017
G12NB2.1 206179 195774 194355 195100 64047 58094

@benjjneb
Copy link
Owner

I would check two things then:

(1) What exactly is your amplicon design? i.e. what primer set, are primers on the start of the reads, and is there any other additional technical bases (e.g. heterogeneity spacers, barcodes) at the start of the reads.

(2) Could there be substantial off-target amplification? You can check this straightforwardly by processing just forward reads for a sample or two. What is showing up there that is not in the merged reads?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants