Representing a Generalized Pareto with supported factors #380

solna86 · 2021-12-30T10:45:39Z

I am trying to model the tail of a Gumbel, i.e. a Generalized Pareto.

I understand that the philosophy of Infer.NET is to provide basic distributions and let the user combine them, as discussed in the old forum and in some issues here. Using this approach, how shall I represent a Generalized Pareto?

Assuming my Generalized Pareto has a positive shape parameter, I can encode it as an Exponential-Gamma mixture as described in: https://en.wikipedia.org/wiki/Generalized_Pareto_distribution#GPD_as_an_Exponential-Gamma_Mixture

Gamma factors are supported natively by Infer.NET, so I can use this directly as the parameter of an Exponential.

How shall I encode the Exponential? Shall I simply exponentiate some positive real number drawn from a Uniform using the parameter drawn from a Gamma?

tminka · 2021-12-30T12:43:17Z

An exponential distribution is equivalent to a Gamma distribution with shape parameter equal to 1. So you can write this as a Gamma variable whose rate is another Gamma variable. Another way to see it is that the Generalized Pareto with mu=0 and positive shape is a special case of the F distribution.

solna86 · 2021-12-30T14:17:49Z

Many thanks, I had overlooked that equivalence.

What are some recommended weakly-informative priors for alpha and beta in the Gamma distribution, taking into account that GeneralizedPareto(xi=1/alpha, sigma=beta/alpha) where alpha and beta are the shape and rate parameters of Gamma?

tminka · 2021-12-30T20:16:33Z

We don't recommend priors here. Try asking on Cross Validated.

solna86 · 2022-01-06T23:02:20Z

Thanks. I apologize if my question sounded as an off-topic query about priors.

I am quite familiar with that on more general probabilistic systems, and I know this is not the place to ask.

However, I am having some trouble connecting distributions on Infer.NET.

For example, consider a simple Beta-Uniform mixture model where the mixing rate and one parameter of Beta observations are unknown:

var p = Observed(double_array);
var i = p.Range;

var m = Beta(1, 1);
var a = Beta(1, 1);

using(ForEach(i))
{   
    var c = Bernoulli(m);

    using(If(c))
    {   
        p[i] = Beta(a, 1);
    }

    using(IfNot(c))
    {   
        p[i] = Beta(1, 1);
    }
}

Infer.NET did not support the above model with any algorithm and quality band. The part that causes problems is the Beta prior for a.

The only parametrization that I have been able to compile is replacing Beta-Beta with Gaussian/Gamma-Gaussian. But this is quite unnatural as observations are p-values, thus constrained to [0, 1], and very slow.

So my questions are:

Is there a more natural alternative that is supported by Infer.NET?
Can I learn more about these limitations and how to approach them somewhere?

tminka · 2022-01-07T10:52:07Z

To model values constrained to [0,1] in a flexible way, you can use:
- a logistic transformation of a Gaussian
- Max(0, Min(1, Gaussian))
The limitations are documented at the List of factors and constraints. You can see there that stochastic parameters of a Beta distribution are not supported.
For Beta(a,1), a Gamma prior on a is conjugate so this would be fairly easy to support.

tminka · 2022-01-07T16:56:01Z

PR #386 adds support for Beta(a,1) with Gamma a.

solna86 · 2022-01-11T04:34:57Z

Many thanks for taking the time to support this @tminka!

I have pulled the latest master and built Infer.NET. A mixture model like the one I posted previously, with Beta(a, 1) or Beta(1, a), and a = Gamma(...) in one of the discrete mixture branches now compiles on that Infer.NET build, which is great.

Typically, in a Beta-Uniform mixture model of p-values, the free parameter in Beta is alpha [1]. And alpha is usually constrained to [0, 1] in MLE. However, this parametrization crashes at runtime:

Unhandled exception. System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation.
 ---> Microsoft.ML.Probabilistic.Factors.ImproperMessageException: Improper distribution during inference (Beta(NaN,8130)).  Cannot perform inference on this model.

I assume this is a numerical issue (underflow?). Changing the parameters of the Gamma prior didn't help.

Switching to a Beta(1, a) mixing component, instead of Beta(a, 1), works well for medium-sized datasets. I presume this is because here the posterior distribution of a is much larger than 1.

I've encountered the same issue for large datasets of ~1e7 p-values, i.e. again the same runtime error with a NaN in Beta. Is there anything I can do to scale Infer.NET to these large datasets?

[1] https://academic.oup.com/bioinformatics/article/19/10/1236/184434

tminka · 2022-01-11T09:52:28Z

How can I reproduce that problem?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Representing a Generalized Pareto with supported factors #380

Representing a Generalized Pareto with supported factors #380

solna86 commented Dec 30, 2021 •

edited

tminka commented Dec 30, 2021

solna86 commented Dec 30, 2021

tminka commented Dec 30, 2021

solna86 commented Jan 6, 2022

tminka commented Jan 7, 2022 •

edited

tminka commented Jan 7, 2022

solna86 commented Jan 11, 2022

tminka commented Jan 11, 2022

Representing a Generalized Pareto with supported factors #380

Representing a Generalized Pareto with supported factors #380

Comments

solna86 commented Dec 30, 2021 • edited

tminka commented Dec 30, 2021

solna86 commented Dec 30, 2021

tminka commented Dec 30, 2021

solna86 commented Jan 6, 2022

tminka commented Jan 7, 2022 • edited

tminka commented Jan 7, 2022

solna86 commented Jan 11, 2022

tminka commented Jan 11, 2022

solna86 commented Dec 30, 2021 •

edited

tminka commented Jan 7, 2022 •

edited