Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Representing a Generalized Pareto with supported factors #380

Open
solna86 opened this issue Dec 30, 2021 · 8 comments
Open

Representing a Generalized Pareto with supported factors #380

solna86 opened this issue Dec 30, 2021 · 8 comments

Comments

@solna86
Copy link

solna86 commented Dec 30, 2021

I am trying to model the tail of a Gumbel, i.e. a Generalized Pareto.

I understand that the philosophy of Infer.NET is to provide basic distributions and let the user combine them, as discussed in the old forum and in some issues here. Using this approach, how shall I represent a Generalized Pareto?

Assuming my Generalized Pareto has a positive shape parameter, I can encode it as an Exponential-Gamma mixture as described in: https://en.wikipedia.org/wiki/Generalized_Pareto_distribution#GPD_as_an_Exponential-Gamma_Mixture

Gamma factors are supported natively by Infer.NET, so I can use this directly as the parameter of an Exponential.

How shall I encode the Exponential? Shall I simply exponentiate some positive real number drawn from a Uniform using the parameter drawn from a Gamma?

@tminka
Copy link
Contributor

tminka commented Dec 30, 2021

An exponential distribution is equivalent to a Gamma distribution with shape parameter equal to 1. So you can write this as a Gamma variable whose rate is another Gamma variable. Another way to see it is that the Generalized Pareto with mu=0 and positive shape is a special case of the F distribution.

@solna86
Copy link
Author

solna86 commented Dec 30, 2021

Many thanks, I had overlooked that equivalence.

What are some recommended weakly-informative priors for alpha and beta in the Gamma distribution, taking into account that GeneralizedPareto(xi=1/alpha, sigma=beta/alpha) where alpha and beta are the shape and rate parameters of Gamma?

@tminka
Copy link
Contributor

tminka commented Dec 30, 2021

We don't recommend priors here. Try asking on Cross Validated.

@solna86
Copy link
Author

solna86 commented Jan 6, 2022

Thanks. I apologize if my question sounded as an off-topic query about priors.

I am quite familiar with that on more general probabilistic systems, and I know this is not the place to ask.

However, I am having some trouble connecting distributions on Infer.NET.

For example, consider a simple Beta-Uniform mixture model where the mixing rate and one parameter of Beta observations are unknown:

var p = Observed(double_array);
var i = p.Range;

var m = Beta(1, 1);
var a = Beta(1, 1);

using(ForEach(i))
{   
    var c = Bernoulli(m);

    using(If(c))
    {   
        p[i] = Beta(a, 1);
    }

    using(IfNot(c))
    {   
        p[i] = Beta(1, 1);
    }
}

Infer.NET did not support the above model with any algorithm and quality band. The part that causes problems is the Beta prior for a.

The only parametrization that I have been able to compile is replacing Beta-Beta with Gaussian/Gamma-Gaussian. But this is quite unnatural as observations are p-values, thus constrained to [0, 1], and very slow.

So my questions are:

  1. Is there a more natural alternative that is supported by Infer.NET?
  2. Can I learn more about these limitations and how to approach them somewhere?

@tminka
Copy link
Contributor

tminka commented Jan 7, 2022

  1. To model values constrained to [0,1] in a flexible way, you can use:
    • a logistic transformation of a Gaussian
    • Max(0, Min(1, Gaussian))
  2. The limitations are documented at the List of factors and constraints. You can see there that stochastic parameters of a Beta distribution are not supported.
  3. For Beta(a,1), a Gamma prior on a is conjugate so this would be fairly easy to support.

@tminka
Copy link
Contributor

tminka commented Jan 7, 2022

PR #386 adds support for Beta(a,1) with Gamma a.

@solna86
Copy link
Author

solna86 commented Jan 11, 2022

Many thanks for taking the time to support this @tminka!

I have pulled the latest master and built Infer.NET. A mixture model like the one I posted previously, with Beta(a, 1) or Beta(1, a), and a = Gamma(...) in one of the discrete mixture branches now compiles on that Infer.NET build, which is great.

Typically, in a Beta-Uniform mixture model of p-values, the free parameter in Beta is alpha [1]. And alpha is usually constrained to [0, 1] in MLE. However, this parametrization crashes at runtime:

Unhandled exception. System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation.
 ---> Microsoft.ML.Probabilistic.Factors.ImproperMessageException: Improper distribution during inference (Beta(NaN,8130)).  Cannot perform inference on this model.

I assume this is a numerical issue (underflow?). Changing the parameters of the Gamma prior didn't help.

Switching to a Beta(1, a) mixing component, instead of Beta(a, 1), works well for medium-sized datasets. I presume this is because here the posterior distribution of a is much larger than 1.

I've encountered the same issue for large datasets of ~1e7 p-values, i.e. again the same runtime error with a NaN in Beta. Is there anything I can do to scale Infer.NET to these large datasets?

[1] https://academic.oup.com/bioinformatics/article/19/10/1236/184434

@tminka
Copy link
Contributor

tminka commented Jan 11, 2022

How can I reproduce that problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants