Possibility of filtering reads based on phred score of selected basepairs #179

caspl1 · 2021-12-08T18:09:11Z

Hi.
Thanks for a great pipeline!
I have a population of cells where cells with two different knock-in single nucleotide substitutions are present amongst unedited cells. I would like to quantify the number of reads with each of these substitutions using CRISPResso2. I would like to apply stringent phred score quality filtering, not to the whole read, but only to these two basepair positions (or if that is not possible the window containing both of them, since they are located quite close to each other).

If i filter using the min_average_read_quality parameter, i will not be sure that these positions contain high quality base calls. If i filter using min_single_bp_quality, i will discard too many reads. And if i filter using min_bp_quality_or_N, the total number of reads with each single nucleotide substitution is not easily accessible in the allele plot since the N's are located different places in the reads and the reads are then not identical and counted together.

Is there a way to do this or would it be possible for you to make it?
Thanks a lot
/Casper

Colelyman · 2021-12-08T19:54:25Z

Thanks for using CRISPResso2! I don't think this is currently implemented, but seems like it would be a neat feature. @kclem may have more ideas on how to currently do this.

If I understand correctly, the feature would take minimum phred scores for specific basepair positions and then filter the reads that have at least those scores, right?

caspl1 · 2021-12-08T20:50:08Z

Thanks for the reply. Yes, that is exactly what i mean!

kclem · 2021-12-11T06:22:45Z

Briefly, I think this would have to be done post-alignment and post-processing via the --fastq_output parameter which creates a fastq file including quality scores and read alignment positions for downstream filtering.

I don't think this is a common use case, so probably just writing a post-processing script would be most useful. What outputs are you looking for? Number of edited vs unedited reads after filtering for quality at those positions?

caspl1 · 2021-12-11T19:43:02Z

Thanks for the reply! I am looking to compare the ratio of the two single base pair substitutions (i.e. number of reads of one compared to the other). They are both present at low frequencies and therefore i want to add stringent requirements for quality of those exact base pairs to minimize effect of sequencing noise.

leonorrib · 2022-03-15T15:15:54Z

Dear @kclem and @Colelyman,

Thank you for your answers. I'm working together with @caspl1 on his project. We are implementing a post-processing script and would like to make sure that we are in the right direction, because so far we can't manage to obtain the same number of fragments that are given by CRISPResso2.

The input data are fastq files from a KI experiment, where there are two single base pair substitutions introduced (ie. two amplicons of interest, each with one single base pair substitution).

We would like to use the output fastq file to:

First, identify the fragments with each of the single base pair substitutions introduced. The total number of these fragments is usually provided in the output files called "9.XXX.Alleles_frequency_table_around_sgRNA_XXXXXXXXXXXX.png"
Then, on the list of fragments of interest identified above, we would like to filter out those where base pair with the substitution has low phred score.

But, so far, we can't manage to reproduce the same numbers that CRISPResso2 outouts in the first part above. Perhaps, since you know better the CRIPSPResso2 algorithm, you can help us with how to identify the fragments of interest.

So far, the steps we follow in our script are:

For each sequenced fragment:

Check whether the fragment aligns to some of the amplicons of interest. If not, then discard it.
Check whether the alignment is an ambiguous alignment (ie. to multiple amplicons). If so, then discard the fragment.
If the fragment aligns to TYPE1 ("ALN" field in the output fastq file) & there are no alignment gaps in the reference sequence or aligned sequence ("ALN_DETAILS" field in the output fastq file), then count the fragment as "type1".
If the alignment is to TYPE2 ("ALN" field in the output fastq file) & there are no gaps in the reference sequence or aligned sequence ("ALN_DETAILS" field in the output fastq file), then count the fragment as "type2".

Best regards,
Leonor

kclem added the enhancement New feature or request label Dec 11, 2021

JensBager mentioned this issue Sep 30, 2023

Phred scores on aligned reads #342

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possibility of filtering reads based on phred score of selected basepairs #179

Possibility of filtering reads based on phred score of selected basepairs #179

caspl1 commented Dec 8, 2021

Colelyman commented Dec 8, 2021 •

edited

caspl1 commented Dec 8, 2021

kclem commented Dec 11, 2021

caspl1 commented Dec 11, 2021

leonorrib commented Mar 15, 2022 •

edited

Possibility of filtering reads based on phred score of selected basepairs #179

Possibility of filtering reads based on phred score of selected basepairs #179

Comments

caspl1 commented Dec 8, 2021

Colelyman commented Dec 8, 2021 • edited

caspl1 commented Dec 8, 2021

kclem commented Dec 11, 2021

caspl1 commented Dec 11, 2021

leonorrib commented Mar 15, 2022 • edited

Colelyman commented Dec 8, 2021 •

edited

leonorrib commented Mar 15, 2022 •

edited