Phred scores on aligned reads #342

JensBager · 2023-09-30T15:17:51Z

Hi,

Thank you for creating such a useful pipeline, it has been a pleasure exploring its applications. I intend to do something similar to #179, as I have specific bases of interest within each read, I must be very confident in. I which to be more stringent in my QC at these exact bases than in the remainder of the read. I agree as mentioned in #179 that a post-processing script is likely the best way to go about it.

My approach has been using the '--fastq_output' flag to create a fastq file, which I can perform the extended QC on. By the length of the fastq output, I've concluded it contains 'READS AFTER PREPROCESSING'. However, I've not had any luck in producing a post-processing script that can trim said fastq-out file to a file similar in length to that on the 'ALIGNED READS' (see attached photo for reference in the html-output). Would it be possible to get the fastq-output for only aligned reads, as this would make the post-processing much easier?

If this is not possible, I've been contemplating using a combination of the '--min_single_bp_quality' and '--quantification_window_size' flags. Would it be possible to use these flags in combination so I can set a quality threshold (which is more stringent than for the remainder of the read) specifically for the quantification window around the provided guide RNA site? I think this would be a good alternative solution, which would overcome the problem of discarding too many reads as mentioned in #179.

Thank you!

kclem · 2023-10-01T07:58:37Z

Hi @JensBager, thanks for using CRISPResso!

Right - the fastq_output file will contain all "READS AFTER PREPROCESSING" (reads that fail this step were e.g. ones where R1/R2 couldn't be merged).

Reads that don't align will be included in the fastq_output file, but will will be marked in the 3rd line with ALN=NA. So it's probably easiest to discard those reads in your post-processing script.

I would probably not go the route of the combined '--min_single_bp_quality' and '--quantification_window_size' flags. The quantification_window_size won't affect whether reads end up in the fastq_output file - it only affects how reads are classified as modified or unmodified.

I wrote a quick script to count bases and qualities here: scripts/countHighQualityBases.py if that helps give an idea of how to parse the fastq_output file. Let me know if it's helpful.

JensBager · 2023-10-01T12:43:38Z

Thank you for the quick answer and for sharing your insights - I really appreciate it! I'll have a look at your script and try to include the 'ALN=NA' in my post-processing script (hadn't been able to find this, so it is already a great progress!) and let you know how things work out.

JensBager · 2023-11-07T20:59:09Z

Hi. I've managed to perform the filtering based on phred scores in a subset of a read using the information you've shared - thank you!

JensBager closed this as completed Nov 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phred scores on aligned reads #342

Phred scores on aligned reads #342

JensBager commented Sep 30, 2023

kclem commented Oct 1, 2023

JensBager commented Oct 1, 2023

JensBager commented Nov 7, 2023

Phred scores on aligned reads #342

Phred scores on aligned reads #342

Comments

JensBager commented Sep 30, 2023

kclem commented Oct 1, 2023

JensBager commented Oct 1, 2023

JensBager commented Nov 7, 2023