Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why FASTQ quality value above qmax is treated as Fatal error? #522

Open
billzt opened this issue May 21, 2023 · 5 comments
Open

Why FASTQ quality value above qmax is treated as Fatal error? #522

billzt opened this issue May 21, 2023 · 5 comments

Comments

@billzt
Copy link

billzt commented May 21, 2023

When I used --fastx_filter to trim FASTQ files (seeming like a very easy task),

$ vsearch --fastx_filter in.fastq --fastq_stripleft 20 --fastq_stripright 20  --fastaout out.fa

I encountered Fatal error as

Fatal error: FASTQ quality value (75) above qmax (41)
By default, quality values range from 0 to 41.
To allow higher quality values, please use the option --fastq_qmax 75

So I added --fastq_qmax 75. However another Fatal error occurred

Fatal error: FASTQ quality value (76) above qmax (75)
By default, quality values range from 0 to 41.
To allow higher quality values, please use the option --fastq_qmax 76

An example is https://www.ncbi.nlm.nih.gov/sra/SRR19317902, where its reads look like

@SRR19317902.1 M05585:4:000000000-BPG86:1:1101:8340:1726_CONS_SUB_SUB_CMP length=313
TTTATCTTCATCTATTGGAGCTCCAGGTCCTGCAGTTGACTTGGCTATTTTTTCACTTCACATAGCTGGTGCATCTTCAATTTTAGGTGCTATTAACTTTATCACTACTATATATAATATGCGTGCCCCTGGAATGACACTTCACAAAATGCCTTTGTTTGTTTGGTCCATTTTAGTAACGGCTTTTTTATTACTACTATCCTTACCAGTATTAGCTGGTGCAATCACTATGCTTTTAACCGATAGAAACTTTGGTACAGCTTTTTTTGCAGCAGAAGCTGGAGGTGATCCAGTACTTTTCCAACATCTATTT
+SRR19317902.1 M05585:4:000000000-BPG86:1:1101:8340:1726_CONS_SUB_SUB_CMP length=313
HGIHIHHIIIIIIIHIIHIIIFHIHI:GHFFGHIIHE.EEHEHHBHHHIHHIII>BHHIEEEGIHIHIIIIIIIHHI;HIIIIIHIIIHHIH;GHCGHIlef\abmllj_hQ`ddgX\h]ghigddkkalkab]lfgkcg`lmllmlhi`bYi]imlljblilg^Yll^cklkki^lkmliifhhjmmmmmlmli^^_hmd]hlmlllmlllgIIIHHFIIGHHDCHHGIIIEGCIEHEIIIIIIIIIIIIHIHEIIIIIIIIIHIHHIIIIIHIIIHIIIIIIIIIHIHEIIHHHB>HHHIIHIIHHHIIIH
@SRR19317902.2 M05585:4:000000000-BPG86:1:1101:19814:1776_CONS_SUB_SUB_CMP length=313
ATTATCATCTAAAACCGGACAACCTGATCCAGCAATGCATTTAGCTATTCTGGCGCTACACGTAGCTGGAGCTTCTTCAATTTTAGGACCGATAAATTTTATAACTACAATTCTAAACATGAGAACTCCTGGTATGACACTACATAAGATGCCACTATTTTCATGGTCAATTTTAGTAACTGCGTTTTTACTTCTATTATCTTTACCAGTTTTAGCGGGAGCAATTACAATGCTGCTCACAGACAGAAATTTTGGAACAGCTTTTTTTGAAGCTCAAACAGGAGGAGACCCAGTTTTGTTTCAACATTTATTT
+SRR19317902.2 M05585:4:000000000-BPG86:1:1101:19814:1776_CONS_SUB_SUB_CMP length=313
:G;HEH.=8E=8E>8.9BHEB8..E...EB.8.GEH..E.8BH.8.;E>G..;.-:?-9G><.C6HH>.;D>.D.C;H...E>H...E.;6-B-6.7.9b]Naha^RGfRPb^j]GhgllDKCQUIe\kmcZlll]EKj_`QDedli[E[KHXN,UQ^ajklQbVPcdeGfQXQMUQc:[@4S`EH^k\OPcg[OUjgEcFagjFXEVRQ?MS-@9HHAH;IIIHHGHAHE.HH?E.EE.E.;.>>H?.H.AH>A>IHIIHEH>HCH;E<8..IHEG8EIIIGGE8E..:IEHG8EIIIIIGHIHE>HHEGEH
@SRR19317902.3 M05585:4:000000000-BPG86:1:1101:10609:1789_CONS_SUB_SUB_CMP length=313
TCTAAGTCATATTACGAGCCACTCTGGCGGTGAAGATGATTTAGCTATTTTCAGTTTACATTTATCTGGAGCGAGCAGTATTTTAGGGGCGATTAACTTTATTACCACTATCTTTAACATGCGTGGGCCTGGTCTGGGCTTCCATCGCTTACCTTTATTTGTGTGGTCTGTTCTGATTACGGCCTTCTTACTTCTTCTCTCTTTACCGGTTTTAGCGGGAGCGATTACGATGCTGTTAACGGATCGTAACTTCAATACCTCGTTCTTTGATCCGGCGGGCGGAGGTGACCCTATTTTGTGCCAGCACCTTTTT
+SRR19317902.3 M05585:4:000000000-BPG86:1:1101:10609:1789_CONS_SUB_SUB_CMP length=313
-BGHECE>;;GH;.GB.B--..8>.E>.----.EH.E....E8>E88;.EG;E>EE.G>HG;8EAACA..;.:BH9-D..7>>7EH:FE<<-B>8.<G.f[kkillh[lRR`NN]g[PgfljmbjijZbQcMQZ;>K`PLLaP\I_jcNOS]<]G7OORRZ[]VROMQbkgia]eik]\IaZhVjcdaQ`jhiY^ciaklibdkRehkXYXL^IEEH9IHIF>E>HIGHH:HIHHFGIH.-HEH<HHGHCHHIH;D6??EIIEBH.IHBH<E9<EGGH<E>GH8H>HHCGCEGB8>B>E.HE8:>EB9EGIHH

Above all, what I wondered is why FASTQ quality value above qmax is treated as Fatal error? I had thought that any quality values larger than 41 would be cut off to 41.

@billzt
Copy link
Author

billzt commented May 21, 2023

In addition, I tried to use --fastq_chars and it gave the following results

Reading FASTQ file 100%  
Read 54687 sequences.
Qmin 42, QMax 109, Range 68
Guess: -fastq_qmin 9 -fastq_qmax 76 -fastq_ascii 33
Guess: Illumina 1.8+ format (phred+33)

Therefore, is it the best practice to use --fastq_chars at the first step to decide -fastq_qmin, -fastq_qmax and -fastq_ascii, and then continue with the downstream analysis?

@frederic-mahe
Copy link
Collaborator

Above all, what I wondered is why FASTQ quality value above qmax is treated as Fatal error?

Reads with quality values encoded on more than 41 levels are relatively recent. Observing values outside the 0-41 range used to be the sign that something went wrong. Nowadays, maybe 0-93 should be the new default range for quality values?

Therefore, is it the best practice to use --fastq_chars at the first step to decide -fastq_qmin, -fastq_qmax and -fastq_ascii, and then continue with the downstream analysis?

Yes, finding the qmax value beforehand could avoid a fatal error downstream. One possibility is to capture the value in a variable:

QMAX=$(printf "@s1\nA\n+\nI\n" | \
       vsearch --fastq_chars - 2>&1 | \
       grep -o -P "fastq_qmax *\K[[:digit:]]+")

Here QMAX would contain the value 40.

@billzt billzt closed this as completed May 22, 2023
@frederic-mahe
Copy link
Collaborator

maybe 0-93 should be the new default range for quality values?

@torognes what do you think of that?

@torognes
Copy link
Owner

torognes commented Jul 5, 2023

I agree that the world has moved forward and that some FASTQ files now contain q values above 41.

I also agree that the intention of this option was to help users detect old FASTQ files (phred 64), which is hardly a problem anymore.

Setting default qmax to 93 would essentially remove its effect, as it is the highest possible quality value obtainable with any printable character.

I am a bit sceptical to change default values and behaviour like this without a major version change, since it could potentially change results. But the change would only be to allow some files to be analysed which would previously fail.

Compatibility with usearch is another issue. I haven't check what it does.

Another possibility is to issue a warning instead of a fatal error. Perhaps a warning at the first encounter and then another one after reading the whole file with the maximum q value.

@billzt
Copy link
Author

billzt commented Jul 6, 2023

@torognes I agree with issuing a warning instead of a fatal error. Fatal error is too nervous

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants