Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fecal samples not the same results and error msg when learning errors #1927

Open
FlorianRocher opened this issue Apr 12, 2024 · 2 comments
Open

Comments

@FlorianRocher
Copy link

Hi,

I'm trying to apply the pipeline described here https://benjjneb.github.io/LRASManuscript/LRASms_Zymo.html using 1.31 version of dada2. But I'm not getting the same number of filtered reads. Also I get the following error message when I use the learnErrors function: The max qual score of 93 was not detected. Using standard error fitting.
Error rates could not be estimated (this is usually because of very few reads).
Error in getErrors(err, enforce = TRUE) : Error matrix is NULL.

Here are the results I get from the lines before:

prim <- removePrimers(MockZymo, nop, primer.fwd=F27, primer.rev=dada2:::rc(R1492), orient=TRUE, verbose=TRUE)
Multiple matches to the primer(s) in some sequences. Using the longest possible match.
39439 sequences out of 77453 are being reverse-complemented.
Overwriting file:C:\Users\fner0001\Documents-Local\PostDoc_Florian\Lab_Project_Data\Metabarcoding_Tutorials\PacbioFecal\Mock\noprimers\Zymo.fastq.gz
Read in 77453, output 73057 (94.3%) filtered sequences.

filt <- file.path(MockDir, "noprimers", "filt", basename(MockZymo))
track <- filterAndTrim(nop, filt, minQ=3, minLen=1000, maxLen=1600, maxN=0, rm.phix=FALSE, maxEE=2, verbose=TRUE)
Overwriting file:C:\Users\fner0001\Documents-Local\PostDoc_Florian\Lab_Project_Data\Metabarcoding_Tutorials\PacbioFecal\Mock\noprimers\filt\Zymo.fastq.gz
Read in 73057, output 72940 (99.8%) filtered sequences.
track <- fastqFilter(nop, filt, minQ=3, minLen=1000, maxLen=1600, maxN=0, rm.phix=FALSE, maxEE=2, verbose=TRUE)
Overwriting file:C:/Users/fner0001/Documents-Local/PostDoc_Florian/Lab_Project_Data/Metabarcoding_Tutorials/PacbioFecal/Mock/noprimers/filt/Zymo.fastq.gz
Read in 73057, output 72940 (99.8%) filtered sequences.

dereplicate

drp <- derepFastq(filt, verbose=TRUE)
Dereplicating sequence entries in Fastq file: C:/Users/fner0001/Documents-Local/PostDoc_Florian/Lab_Project_Data/Metabarcoding_Tutorials/PacbioFecal/Mock/noprimers/filt/Zymo.fastq.gz
Encountered 22309 unique sequences from 72940 total sequences read.

Best

Florian Rocher

@FlorianRocher FlorianRocher changed the title Fecal samples not the ame results and error msg when learning errors Fecal samples not the same results and error msg when learning errors Apr 12, 2024
@benjjneb
Copy link
Owner

Did you download the data using sra toolkit? Or via web links?

You need to use the sra toolkit to get the valid data. The web links give "SRAlite" data format which is not the original data -- it replaces all the real quality scores with Q30.

See here for more info: benjjneb/LRASManuscript#7

@FlorianRocher
Copy link
Author

Thanks a lot for your answer. Indeed I downloaded the samples directly from ncbi web page. I'll use sratoolkit.

Best

Florian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants