posterior values using lofreq rule? #88

vicfabienne · 2021-03-31T05:51:20Z

Hey, thanks for all the effort you put in this pipeline!

Because I have to call variants in regions with quite low coverage I recently tried running the v-pipe SARS-CoV branch using lofreq as snv caller, defined by the config file as written in the documentation. After some issues I also adjusted the "coverage_intervals"; "coverage" value to 10 (to fit the lofreq filter).

In the visualization, however, I only get posterior scores of 1 for every variant. Since it also calls the ShoRAH rule after lofreq I was wondering why this is the case but couldn't find anything so far.
Is this an expected behaviour?
Is there a way to adjust the snv rule to get the posterior scores also when using lofreq as a snv caller?
Do you maybe have any recommendations how to apply certain frequency filtering on lofreq variants, regardless of whether they can be included in the visualization afterwards or not? (I think it's calculating a p-value but I couldn't find how to make use of this in v-pipe)

Any hints where I could start to look at, would be highly appreciated. Thanks!

namhsuya · 2021-05-15T06:22:11Z

Hi @vfschumann, I am facing the same issue.

It seems the visualization is only optimized for ShoRAH outputs, because the formula they use to calculate the posterior probability is this:
"posterior": round(1 - 10**(-record.QUAL / 10), 3)
(You can find this formulat in your vpipe/scripts/assemble_web_visualization.py file)

And, once you compare the VCFs produced by Lofreq and ShoRAH, you would notice that the QUAL column has very big values for lofreq as compared to shorah. Which I think results into posterior scores of 1.

Essentially the lofreq and shorah outputs are hugely different, because lofreq also calls indels which shorah does not.

https://sourceforge.net/p/lofreq/discussion/general/thread/7b713493/ is a link to the lofreq author describing how the tools calculates the QUAL score, maybe you could take hints from that for calculating the posterior scores for lofreq VCF outputs.

I will also update once I am able to figure that out. Thanks~

kpj · 2021-05-15T07:51:35Z

For the visualization we have create the PR #91 which makes use of the AF INFO field for LoFreq and uses the Freq* fields for ShoRAH. Feel free to give it a try!

At the moment, the QUAL are processed the same way for both callers, but we'd be happy to adapt it fit LoFreq better.
@namhsuya's link mentions this:

The basics are explained in the NAR paper (Wilm, 2012): We compute a
poisson-binomial distribution taking error probabilities at each pileup
site into consideration and derive a p-value from that. Error probabilities
were originally just converted base qualities (because that's what they
are). In later LoFreq versions we merged base alignment, mapping and base
quality into one error probability per base. The logic goes like this:
either the read is misaligned (mapping quality) or if not, the base might
be misaligned, or if neither of that is true then the base itself might be
wrong, i.e.
P_m + (1-P_m)P_a + (1-P_m)(1-P_a)*P_b,
where P_m is the mapping error probability
P_a is the base alignment error probability (BAQ) and
P_b is the base error probability

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

posterior values using lofreq rule? #88

posterior values using lofreq rule? #88

vicfabienne commented Mar 31, 2021

namhsuya commented May 15, 2021 •

edited

kpj commented May 15, 2021 •

edited

posterior values using lofreq rule? #88

posterior values using lofreq rule? #88

Comments

vicfabienne commented Mar 31, 2021

namhsuya commented May 15, 2021 • edited

kpj commented May 15, 2021 • edited

namhsuya commented May 15, 2021 •

edited

kpj commented May 15, 2021 •

edited