Add Tweedie distribution: likelihood, pdf and sampling methods #7310

lorentzenchr · 2021-02-06T19:49:30Z

Is your feature request related to a problem? Please describe

I'd like to have the Tweedie distribution in order to:

calculate the likelihood
calculate the pdf
generate random numbers

Especially the likelihood is interesting for an MLE estimate of the power parameter of a Tweedie GLM, see #2858.

Describe the solution you'd like

The next release of scipy, v1.7.0, will have the special function wright_bessel, see merged PR scipy/scipy#11313. This function greatly simplifies the computation of the Tweedie pdf.

Additional context

In scipy/scipy#11291, it was decided not to have the distribution in scipy itself, only the needed special function.
See also the scipy mailing list https://mail.python.org/pipermail/scipy-dev/2020-March/024074.html.

The text was updated successfully, but these errors were encountered:

josef-pkt · 2021-07-02T21:37:54Z

@lorentzenchr Do you already have some code for some of the distribution methods like logpdf, rvs, moments ?

I started to look at it again, but I don't remember enough and haven't looked at it in years.

lorentzenchr · 2021-07-03T14:45:16Z

@josef-pkt I don't. But now that wright_bessel is available, it shouldn`t be too hard. The most important is to have a reliable source for the formulae.

The Exponential Despersion Model and therefore also the Tweedie family has two different formulations: additive and reproductive (see wikipedia). Which one to choose? I often prefer the reproductive one.

lorentzenchr · 2021-07-03T15:09:51Z

In Series evaluation of Tweedie exponential dispersion model densities Eq.(1-2):

pdf(y, mu, p, phi) = f(y, theta, phi) = a(y, phi) exp(1/phi (y theta - kappa(theta)))
kappa = cumulant function
theta = function of expectation mu and power p
alpha = (2-p)/(1-p)
for 1<p<2:
a(y, phi) = 1/y * wright_bessel(-alpha, 0, (p-1)**alpha/(2-p) / y**alpha) / phi**(1-alpha)

josef-pkt · 2021-07-03T16:57:11Z

AFAICS, we need the reproductive version for GLM.
wikipedia: For fixed parameter sigma^2, the ED(\mu ,\sigma^2) is a natural exponential family.

statsmodels GLM is a one parameter linear exponential family (LEF), conditional on variance parameters.

When we have the full loglikelihood, then we can use the profile likelihood to estimate the variance parameters in a second step.

(For example we have several versions of negativebinomial in statsmodels.discrete, but only the NB2 version is a LEF and included in GLM.)

josef-pkt · 2021-07-03T17:46:11Z

reference with full MLE versus QMLE/M-estimator (which we currently have as standard GLM with irls or newton, loglike not used)

Wagner Hugo Bonat & Célestin C. Kokonendji (2017): Flexible Tweedie
regression models for continuous data, Journal of Statistical Computation and Simulation, http://dx.doi.org/10.1080/00949655.2017.1318876

josef-pkt · 2021-07-04T16:34:36Z

scipy 1.7.0 was just released in June. I'll wait until Fall to update winpython.
So, I'm not working on this right now.

I found an old gist of mine with simulating gamma compound poisson for rvs, and parameter conversion between compound poisson and tweedie.
(I didn't look at the details again, When I started to look at conversion in wikipedia https://en.wikipedia.org/wiki/Compound_Poisson_distribution#Compound_Poisson_Gamma_distribution , the parameter names differ from my notebook.)
https://gist.github.com/josef-pkt/b8354b64959ebaf8a7b51c49cecdb54c

notebook was written for #2915

josef-pkt · 2021-07-04T16:38:21Z

another article, based on quick look, it uses deviance, or loglike without normalizing constant as objective function.

Wei Qian, Yi Yang & Hui Zou (2016) Tweedie’s Compound Poisson Model
With Grouped Elastic Net, Journal of Computational and Graphical Statistics, 25:2, 606-625,
http://dx.doi.org/10.1080/10618600.2015.1005213

lorentzenchr · 2021-09-26T13:20:47Z

@josef-pkt Where would you like to place this? Under statsmodels.distributions in a separate file?

Which API to follow? An own class that quacks (duck-typing) almost like scipy.stats.rv_continuous?

josef-pkt · 2021-09-26T14:19:03Z

We need the loglikelihood in genmod families so it can be used with GLM, e.g. for profile likelihood for the non-mean parameters. That can then be used immediately with GLM.

The full distribution as scipy distribution like class would go in statsmodels.distributions, new module.

In some families we use different parameterization in the regression models than the standard distribution parameterization.
We use get_distribution method to get the scipy compatible distribution from a regression estimate, e.g. for predictive distribution of new endog y.

Because scipy doesn't have a base class for distributions that are mixed mass-points and continuous, our distribution version might not be able to inherit from rv_continuous.

Eventually, we will write a full MLE model in, most likely statsmodels.othermod, that estimates mean and dispersion simultaneously. (similar to new beta regression)

josef-pkt · 2021-09-26T14:19:36Z

@lorentzenchr Thanks for working on this and getting the special functions into scipy.

josef-pkt · 2021-09-26T14:37:57Z

I haven't thought much about the class structure for the tweedie distribution yet.

I guess we can write a subclass of rv_continuous for the positive part of tweedie y>0 that can reuse the generic methods in scipy.stats. and then mix it with the mass point distribution.

Our distributions.discrete have zero-inflated models that mix in some methods like cdf with the underlying distribution. But those can still inherit generic methods from rv_discrete.

h-vetinari · 2023-05-20T02:51:29Z

This has been released in 0.14 through #8560, right?

josef-pkt · 2023-05-20T12:42:43Z

@h-vetinari
yes

diegodebrito · 2024-04-25T19:07:56Z

@lorentzenchr @josef-pkt

I am having some issues when fitting fitting constrained models and it is related to the discussion about Scipy's wright_bessel function above. When fitting models with constraints, Statsmodels calls the loglike function instead of using reweighted least squares. The problem I'm having is that wright_bessel returns inf for some values, even with power = 1.5. This makes the sum of the likelihood (the objective function in this case) value to be -inf and the minimize option gets stuck there.

@lorentzenchr Did you guys decide to keep the implementation from scratch instead of using wright_bessel in GLUM? If so, was this the reason?

@josef-pkt I can open a separate issue for this and potentially work on a solution. I haven't looked into GLUM's code because I'm unsure about license/permission.

@thequackdaddy Do you remember having any issues with exploding scaling when computing log-likelihood in the Tweedie package?

Thanks!

josef-pkt · 2024-04-25T19:32:39Z

@diegodebrito Yes, please open a separate issue.

Provide a working example that shows the problem, so we can look into this.

besides the underlying code, assuming returning inf is approximately correct in the special function.
Possible reasons for numerical problems

scaling problems that cause over- or underflow during optimization
corner cases, fit approaches a corner case where some functions go to inf or zero.
in other cases we have numerical issues with code log(exp(x)) fails when exp overflows for larger x

Those are cases we ran into in other models. I have not experimented with Tweedie long enough to have an idea where it is fragile.

What kind of constraints on parameter do you have? linear restrictions?
There might be approaches to help in the estimation.
For example, if the unconstrained estimation has a valid result, then we could get a one-step approximation to the constrained parameters to get a better starting value.

diegodebrito · 2024-04-26T18:54:28Z

Thanks for your quick answer, @josef-pkt.

I opened an issue and added as much info as I could: #9234

josef-pkt added comp-distributions comp-genmod type-enh labels Jul 2, 2021

josef-pkt mentioned this issue Jul 3, 2021

SUMM/ENH: generic model estimating equations, M-estimator, gmm #5674

Open

MarcAntoineSchmidtQC mentioned this issue Nov 16, 2021

Tweedie loglikelihood with Wrights Bessel function Quantco/glum#481

Closed

lorentzenchr mentioned this issue May 16, 2022

FEA add Tweedie log likelihood #8281

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Tweedie distribution: likelihood, pdf and sampling methods #7310

Add Tweedie distribution: likelihood, pdf and sampling methods #7310

lorentzenchr commented Feb 6, 2021 •

edited

josef-pkt commented Jul 2, 2021

lorentzenchr commented Jul 3, 2021

lorentzenchr commented Jul 3, 2021 •

edited

josef-pkt commented Jul 3, 2021

josef-pkt commented Jul 3, 2021

josef-pkt commented Jul 4, 2021 •

edited

josef-pkt commented Jul 4, 2021

lorentzenchr commented Sep 26, 2021

josef-pkt commented Sep 26, 2021

josef-pkt commented Sep 26, 2021

josef-pkt commented Sep 26, 2021

h-vetinari commented May 20, 2023

josef-pkt commented May 20, 2023

diegodebrito commented Apr 25, 2024

josef-pkt commented Apr 25, 2024

diegodebrito commented Apr 26, 2024

Add Tweedie distribution: likelihood, pdf and sampling methods #7310

Add Tweedie distribution: likelihood, pdf and sampling methods #7310

Comments

lorentzenchr commented Feb 6, 2021 • edited

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Additional context

josef-pkt commented Jul 2, 2021

lorentzenchr commented Jul 3, 2021

lorentzenchr commented Jul 3, 2021 • edited

josef-pkt commented Jul 3, 2021

josef-pkt commented Jul 3, 2021

josef-pkt commented Jul 4, 2021 • edited

josef-pkt commented Jul 4, 2021

lorentzenchr commented Sep 26, 2021

josef-pkt commented Sep 26, 2021

josef-pkt commented Sep 26, 2021

josef-pkt commented Sep 26, 2021

h-vetinari commented May 20, 2023

josef-pkt commented May 20, 2023

diegodebrito commented Apr 25, 2024

josef-pkt commented Apr 25, 2024

diegodebrito commented Apr 26, 2024

lorentzenchr commented Feb 6, 2021 •

edited

lorentzenchr commented Jul 3, 2021 •

edited

josef-pkt commented Jul 4, 2021 •

edited