New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop in Qualityscore in the middle of the reads #1935
Comments
I don't have a satisfying answer, but yes the abnormal quality drop in the middle of the reads could be connected to a lot of reads lost at merging and chimera. But I don't know what the underlying mechanism would be, as I've never seen this type of quality profile before. One thing to check is whether the reads pre- and post-quality-drop both look like bacterial 16S sequences, and whether there is any low complexity sequence in the data (see the dada2 |
Thanks for the feedback. The output from the plotComplexity for the first 2 samples looks like this: When I try the seqComplexity I get this error:
I have attached the first two forward and reverse samples S217_R1_001.fastq.gz S218_R1_001.fastq.gz I might try to get the raw data for demultiplexing my self. |
Those first two plots in particular look troubling. Bimodal distributions of complexity scores suggest there is a mixture between normal (high complexity) sequences, and sequences that partially contain low complexity regions. This should not be observed in V4 data. To use |
Ahh, thanks for that; I did as you suggested I furthermore did a BLAST of some of the sequences I got from one of the samples and received back bacteria. Thanks |
Yes, you would set the |
Thanks for the input so far! However, by doing so, will I then run into troubles when assigning taxa as I am working with v4 2x150 bp? Finally, if I use truncLen I seem to loos more reads when merging compared to when I use truncLen, however, my segtab.nochim/segtab percentage is much better when using truncLen out <- filterAndTrim(fnFs, filtFs, fnRs, filtRs, I get a percentage:
S217 28311 12809 12289 12379 10879 7446 where when I use truncLen i get this ratio: out <- filterAndTrim(fnFs, filtFs, fnRs, filtRs, truncLen=c(117,147), sum(seqtab.nochim)/sum(seqtab) = 0.7294229
S217 28311 21459 20811 21053 5206 4391 Sorry for the long post and thanks again! |
I should have caught this earlier, but the
This is a high rate of chimeric reads. But I would check that after fixing the |
Thanks for that; I had set the minOverlap lower, all the way down to 3, in the examples I sent you to see if that would make anything different; sorry for not informing you of that. I will run it again with an appropriate truncLen that gives a minimum of 270 and see how that goes. The data is some I have received from someone else to combine with some other data for a paper, so I don't know how the laboratory protocols have been done. |
Hi, I am looking at some data from the 2 x 150 bp V4 region, and when I look at the quality plot, we see a drop in the middle of the read for both forward and reverse. Furthermore, we see that many of our reads don't merge and have a relatively high percentage of chimeras (between 35% - 23%). I have tried different settings for the filtering:
out <- filterAndTrim(fnFs, filtFs, fnRs, filtRs, truncLen=c(135,125), maxN=0, maxEE=c(3,2), truncQ=2, rm.phix=TRUE, compress=TRUE, multithread=TRUE)
S217 28311 27556 26812 27263 8361 7182
S218 71293 69817 68873 69245 5743 3848
S219 54009 52129 51461 51417 9244 18130
S220 42999 41111 40682 40562 18860 16787
S221 46393 44781 44175 44475 11427 10013
S222 44044 42420 42022 41865 18219 17392
sum(seqtab.nochim)/sum(seqtab)
0.770275
Here, we left out the truncLen part:
out <- filterAndTrim(fnFs, filtFs, fnRs, filtRs,
maxN=0, maxEE=c(3,2), truncQ=2, rm.phix=TRUE,
compress=TRUE, multithread=TRUE)
S217 28311 26963 26240 26518 18448 13343
S218 71293 68615 67456 67761 54402 26459
S219 54009 51042 50224 50386 31696 25513
S220 42999 40072 39625 39559 26085 22037
S221 46393 43918 42427 43474 31217 22158
S222 44044 41441 41008 40993 27124 22091
sum(seqtab.nochim)/sum(seqtab)
0.6565934
Can the reason for the sudden drop in the quality of the reads in the middle be a sequencing error, and can this explain the high number of chimeras found in the data?
Thanks for the feedback.
The text was updated successfully, but these errors were encountered: