Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fqtools type incorrectly typing fastq #14

Open
nick-youngblut opened this issue Jun 18, 2018 · 5 comments
Open

fqtools type incorrectly typing fastq #14

nick-youngblut opened this issue Jun 18, 2018 · 5 comments

Comments

@nick-youngblut
Copy link

It appears that the current algorithm for fqtools type can get the fastq quality format wrong. Here's a reproducible example:

fastq-dump --split-files ERR719681
# Read 300355 spots for ERR719681
# Written 300355 spots for ERR719681
fqtools type ERR719681_1.fastq
# fastq-illumina
fqtools type ERR719681_2.fastq
# fastq-sanger

If Bio.SeqIO is then used to read these fastq files with the "type" specified by fqtools type, then the following error occurs:

  File "/ebio/abt3_projects/software/dev/llmgqc/.snakemake/conda/a289c738/lib/python3.6/site-packages/Bio/SeqIO/__init__.py", line 611, in parse
    for r in i:
  File "/ebio/abt3_projects/software/dev/llmgqc/.snakemake/conda/a289c738/lib/python3.6/site-packages/Bio/SeqIO/QualityIO.py", line 1255, in FastqIlluminaIterator
    raise ValueError("Invalid character in quality string")

Maybe using the min & max of qual values (the full range) for all sequences in the fastq file would help prevent these mis-calls?

@nick-youngblut
Copy link
Author

Development on fqtools seems to have stopped. Anyone know of a good alternative tool for typing fastq files?

@alastair-droop
Copy link
Owner

alastair-droop commented Jul 19, 2018 via email

@nick-youngblut
Copy link
Author

Yeah, main jobs do tend to get in the way :)

fqtools type doesn't always type correctly, at least as needed for correct conversion of fastq formats with Bio.SeqIO. I'm probably just going to add a function to my fastq format conversion script that will first type the input format as designated by fqtools type, and then if that doesn't work, just try all fastq input formats until one works.

@alastair-droop
Copy link
Owner

alastair-droop commented Jul 21, 2018 via email

@nick-youngblut
Copy link
Author

Great! Here's an example of different classifications for read1 vs read2:

fastq-dump --skip-technical --split-3 ERR866627

fqtools type ERR866627_1.fastq    
# fqtools type: fastq-sanger
fqtools type ERR866627_2.fastq
# fqtools type: fastq-solexa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants