Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutect2 drops MBQ to 20 in overlapped reads #320

Open
sleyn opened this issue Sep 6, 2023 · 4 comments
Open

Mutect2 drops MBQ to 20 in overlapped reads #320

sleyn opened this issue Sep 6, 2023 · 4 comments
Labels
gatk4 All things related to GATK4/Mutect2

Comments

@sleyn
Copy link

sleyn commented Sep 6, 2023

Hello,

Just encountered the issue using Mutect2 (tried several versions) with PureCN 2.2.0. PureCN has failed for most samples with majority of variants filtered out by BQ<25 filter.

I've figured out what happen:
I have a 150x paired end data with very short insert size (median insert size compared to read length). In this case majority of nucleotides in the data are in the overlapped parts of reads (on both forward and reverse read). In this case Mutect2 starts to be concerned about PCR errors. If the nucleotide came from PCR error and it is supported by two reads it could have very high variant quality. So for its error model Mutect2 adjust base quality (by default to min(20, original_base_quality) in overlapped parts of the reads. It results in drop of MBQ. 20 is a half of 40 which is Phred score of 10^-4 - default PCR error rate in Mutect2 (could be changed by --pcr-snv-qual parameter). In my case just most of variants had MBQ=20 and therefore were filtered out by PureCN.

Some links about this issue:
broadinstitute/gatk#4958 - quite old discussion, but Mutect2 code follows the same logic.
https://gatk.broadinstitute.org/hc/en-us/community/posts/18479740841755-Mutect2-overlapping-reads-behavior-with-pcr-snv-qual-parameter - My question to GATK team about using --pcr-snv-qual.

I'm not sure what should be the best way to deal with it in case of PureCN:

  1. Cut overalled part of the reads. Should it also improve coverage calculations by Coverage.R?
  2. Turn off PCR error correction for setting --pcr-snv-qual hight.
  3. Make MBQ filter in PureCN adjustable.

Best,
Semen

@lima1
Copy link
Owner

lima1 commented Sep 6, 2023

Oh. That explains some of the recent reports I got. I'll add the MBQ filter to PureCN.R. 20 is really low, but I see their concerns about PCR errors and inflated quality scores. Maybe replacing the 25 PureCN default with a dynamic one? Like min(xx_mbq_percentile, 25)? Not sure what the best unfiltered MBQ percentile would be though.

@sleyn
Copy link
Author

sleyn commented Sep 6, 2023

It is a hard question as Mutect2 output seems does not have any information how distinguish where low quality came from - from sequencing errors or from Mutect2 PCR error adjustments.

@lima1
Copy link
Owner

lima1 commented Sep 6, 2023

What I'll do is to skip the filtering based on MBQ when too many variants are filtered out. Then the user can either lower the cutoff or do the filtering upstream.

lima1 added a commit that referenced this issue Sep 6, 2023
@lima1
Copy link
Owner

lima1 commented Sep 6, 2023

I decided to just throw a warning for now. The linked commit adds --min-base-quality to PureCN.R which you can set to 20.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gatk4 All things related to GATK4/Mutect2
Projects
None yet
Development

No branches or pull requests

2 participants