Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phred scores on aligned reads #342

Closed
JensBager opened this issue Sep 30, 2023 · 3 comments
Closed

Phred scores on aligned reads #342

JensBager opened this issue Sep 30, 2023 · 3 comments

Comments

@JensBager
Copy link

Hi,

Thank you for creating such a useful pipeline, it has been a pleasure exploring its applications. I intend to do something similar to #179, as I have specific bases of interest within each read, I must be very confident in. I which to be more stringent in my QC at these exact bases than in the remainder of the read. I agree as mentioned in #179 that a post-processing script is likely the best way to go about it.

My approach has been using the '--fastq_output' flag to create a fastq file, which I can perform the extended QC on. By the length of the fastq output, I've concluded it contains 'READS AFTER PREPROCESSING'. However, I've not had any luck in producing a post-processing script that can trim said fastq-out file to a file similar in length to that on the 'ALIGNED READS' (see attached photo for reference in the html-output). Would it be possible to get the fastq-output for only aligned reads, as this would make the post-processing much easier?

If this is not possible, I've been contemplating using a combination of the '--min_single_bp_quality' and '--quantification_window_size' flags. Would it be possible to use these flags in combination so I can set a quality threshold (which is more stringent than for the remainder of the read) specifically for the quantification window around the provided guide RNA site? I think this would be a good alternative solution, which would overcome the problem of discarding too many reads as mentioned in #179.

Thank you!
Screenshot 2023-09-30 at 16 14 09

@kclem
Copy link
Member

kclem commented Oct 1, 2023

Hi @JensBager, thanks for using CRISPResso!

Right - the fastq_output file will contain all "READS AFTER PREPROCESSING" (reads that fail this step were e.g. ones where R1/R2 couldn't be merged).

Reads that don't align will be included in the fastq_output file, but will will be marked in the 3rd line with ALN=NA. So it's probably easiest to discard those reads in your post-processing script.

I would probably not go the route of the combined '--min_single_bp_quality' and '--quantification_window_size' flags. The quantification_window_size won't affect whether reads end up in the fastq_output file - it only affects how reads are classified as modified or unmodified.

I wrote a quick script to count bases and qualities here: scripts/countHighQualityBases.py if that helps give an idea of how to parse the fastq_output file. Let me know if it's helpful.

@JensBager
Copy link
Author

Thank you for the quick answer and for sharing your insights - I really appreciate it! I'll have a look at your script and try to include the 'ALN=NA' in my post-processing script (hadn't been able to find this, so it is already a great progress!) and let you know how things work out.

@JensBager
Copy link
Author

Hi. I've managed to perform the filtering based on phred scores in a subset of a read using the information you've shared - thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants