-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Tweedie distribution: likelihood, pdf and sampling methods #7310
Comments
@lorentzenchr Do you already have some code for some of the distribution methods like logpdf, rvs, moments ? I started to look at it again, but I don't remember enough and haven't looked at it in years. |
@josef-pkt I don't. But now that The Exponential Despersion Model and therefore also the Tweedie family has two different formulations: additive and reproductive (see wikipedia). Which one to choose? I often prefer the reproductive one. |
In Series evaluation of Tweedie exponential dispersion model densities Eq.(1-2):
|
AFAICS, we need the reproductive version for GLM. statsmodels GLM is a one parameter linear exponential family (LEF), conditional on variance parameters. When we have the full loglikelihood, then we can use the profile likelihood to estimate the variance parameters in a second step. (For example we have several versions of negativebinomial in |
reference with full MLE versus QMLE/M-estimator (which we currently have as standard GLM with irls or newton, loglike not used) Wagner Hugo Bonat & Célestin C. Kokonendji (2017): Flexible Tweedie |
scipy 1.7.0 was just released in June. I'll wait until Fall to update winpython. I found an old gist of mine with simulating gamma compound poisson for notebook was written for #2915 |
another article, based on quick look, it uses deviance, or loglike without normalizing constant as objective function. Wei Qian, Yi Yang & Hui Zou (2016) Tweedie’s Compound Poisson Model |
@josef-pkt Where would you like to place this? Under Which API to follow? An own class that quacks (duck-typing) almost like |
We need the loglikelihood in genmod families so it can be used with GLM, e.g. for profile likelihood for the non-mean parameters. That can then be used immediately with GLM. The full distribution as scipy distribution like class would go in statsmodels.distributions, new module. In some families we use different parameterization in the regression models than the standard distribution parameterization. Because scipy doesn't have a base class for distributions that are mixed mass-points and continuous, our distribution version might not be able to inherit from rv_continuous. Eventually, we will write a full MLE model in, most likely statsmodels.othermod, that estimates mean and dispersion simultaneously. (similar to new beta regression) |
@lorentzenchr Thanks for working on this and getting the special functions into scipy. |
I haven't thought much about the class structure for the tweedie distribution yet. I guess we can write a subclass of rv_continuous for the positive part of tweedie y>0 that can reuse the generic methods in scipy.stats. and then mix it with the mass point distribution. Our distributions.discrete have zero-inflated models that mix in some methods like cdf with the underlying distribution. But those can still inherit generic methods from rv_discrete. |
@h-vetinari |
I am having some issues when fitting fitting constrained models and it is related to the discussion about Scipy's wright_bessel function above. When fitting models with constraints, Statsmodels calls the loglike function instead of using reweighted least squares. The problem I'm having is that wright_bessel returns inf for some values, even with power = 1.5. This makes the sum of the likelihood (the objective function in this case) value to be -inf and the minimize option gets stuck there. @lorentzenchr Did you guys decide to keep the implementation from scratch instead of using wright_bessel in GLUM? If so, was this the reason? @josef-pkt I can open a separate issue for this and potentially work on a solution. I haven't looked into GLUM's code because I'm unsure about license/permission. @thequackdaddy Do you remember having any issues with exploding scaling when computing log-likelihood in the Tweedie package? Thanks! |
@diegodebrito Yes, please open a separate issue. Provide a working example that shows the problem, so we can look into this. besides the underlying code, assuming returning inf is approximately correct in the special function.
Those are cases we ran into in other models. I have not experimented with Tweedie long enough to have an idea where it is fragile. What kind of constraints on parameter do you have? linear restrictions? |
Thanks for your quick answer, @josef-pkt. I opened an issue and added as much info as I could: #9234 |
Is your feature request related to a problem? Please describe
I'd like to have the Tweedie distribution in order to:
Especially the likelihood is interesting for an MLE estimate of the power parameter of a Tweedie GLM, see #2858.
Describe the solution you'd like
The next release of scipy, v1.7.0, will have the special function
wright_bessel
, see merged PR scipy/scipy#11313. This function greatly simplifies the computation of the Tweedie pdf.Additional context
In scipy/scipy#11291, it was decided not to have the distribution in scipy itself, only the needed special function.
See also the scipy mailing list https://mail.python.org/pipermail/scipy-dev/2020-March/024074.html.
The text was updated successfully, but these errors were encountered: