Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

posterior values using lofreq rule? #88

Open
vicfabienne opened this issue Mar 31, 2021 · 2 comments
Open

posterior values using lofreq rule? #88

vicfabienne opened this issue Mar 31, 2021 · 2 comments

Comments

@vicfabienne
Copy link

Hey, thanks for all the effort you put in this pipeline!

Because I have to call variants in regions with quite low coverage I recently tried running the v-pipe SARS-CoV branch using lofreq as snv caller, defined by the config file as written in the documentation. After some issues I also adjusted the "coverage_intervals"; "coverage" value to 10 (to fit the lofreq filter).

In the visualization, however, I only get posterior scores of 1 for every variant. Since it also calls the ShoRAH rule after lofreq I was wondering why this is the case but couldn't find anything so far.
Is this an expected behaviour?
Is there a way to adjust the snv rule to get the posterior scores also when using lofreq as a snv caller?
Do you maybe have any recommendations how to apply certain frequency filtering on lofreq variants, regardless of whether they can be included in the visualization afterwards or not? (I think it's calculating a p-value but I couldn't find how to make use of this in v-pipe)

Any hints where I could start to look at, would be highly appreciated. Thanks!

@namhsuya
Copy link

namhsuya commented May 15, 2021

Hi @vfschumann, I am facing the same issue.

It seems the visualization is only optimized for ShoRAH outputs, because the formula they use to calculate the posterior probability is this:
"posterior": round(1 - 10**(-record.QUAL / 10), 3)
(You can find this formulat in your vpipe/scripts/assemble_web_visualization.py file)

And, once you compare the VCFs produced by Lofreq and ShoRAH, you would notice that the QUAL column has very big values for lofreq as compared to shorah. Which I think results into posterior scores of 1.

Essentially the lofreq and shorah outputs are hugely different, because lofreq also calls indels which shorah does not.

https://sourceforge.net/p/lofreq/discussion/general/thread/7b713493/ is a link to the lofreq author describing how the tools calculates the QUAL score, maybe you could take hints from that for calculating the posterior scores for lofreq VCF outputs.

I will also update once I am able to figure that out. Thanks~

@kpj
Copy link
Contributor

kpj commented May 15, 2021

For the visualization we have create the PR #91 which makes use of the AF INFO field for LoFreq and uses the Freq* fields for ShoRAH. Feel free to give it a try!

At the moment, the QUAL are processed the same way for both callers, but we'd be happy to adapt it fit LoFreq better.
@namhsuya's link mentions this:

The basics are explained in the NAR paper (Wilm, 2012): We compute a
poisson-binomial distribution taking error probabilities at each pileup
site into consideration and derive a p-value from that. Error probabilities
were originally just converted base qualities (because that's what they
are). In later LoFreq versions we merged base alignment, mapping and base
quality into one error probability per base. The logic goes like this:
either the read is misaligned (mapping quality) or if not, the base might
be misaligned, or if neither of that is true then the base itself might be
wrong, i.e.
P_m + (1-P_m)P_a + (1-P_m)(1-P_a)*P_b,
where P_m is the mapping error probability
P_a is the base alignment error probability (BAQ) and
P_b is the base error probability

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants