Skip to content

Commit

Permalink
Merge pull request #4633 from josef-pkt/backport_0.9.0
Browse files Browse the repository at this point in the history
Backport 0.9.0
  • Loading branch information
josef-pkt committed May 14, 2018
2 parents 40e4f56 + b07f5be commit a888919
Show file tree
Hide file tree
Showing 17 changed files with 348 additions and 100 deletions.
22 changes: 11 additions & 11 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,15 @@ Documentation

The documentation for the latest release is at

http://www.statsmodels.org/stable/
http://www.statsmodels.org/stable/

The documentation for the development version is at

http://www.statsmodels.org/dev/
http://www.statsmodels.org/dev/

Recent improvements are highlighted in the release notes

http://www.statsmodels.org/stable/release/version0.9.html
http://www.statsmodels.org/stable/release/version0.9.html

Backups of documentation are available at http://statsmodels.github.io/stable/
and http://statsmodels.github.io/dev/.
Expand Down Expand Up @@ -104,7 +104,7 @@ Main Features

* Miscellaneous models
* Sandbox: statsmodels contains a sandbox folder with code in various stages of
developement and testing which is not considered "production ready". This covers
developement and testing which is not considered "production ready". This covers
among others

- Generalized method of moments (GMM) estimators
Expand All @@ -117,27 +117,27 @@ How to get it
=============
The master branch on GitHub is the most up to date code

https://www.github.com/statsmodels/statsmodels
https://www.github.com/statsmodels/statsmodels

Source download of release tags are available on GitHub

https://github.com/statsmodels/statsmodels/tags
https://github.com/statsmodels/statsmodels/tags

Binaries and source distributions are available from PyPi

http://pypi.python.org/pypi/statsmodels/
http://pypi.python.org/pypi/statsmodels/

Binaries can be installed in Anaconda

conda install statsmodels
conda install statsmodels


Installing from sources
=======================

See INSTALL.txt for requirements or see the documentation

http://statsmodels.github.io/dev/install.html
http://statsmodels.github.io/dev/install.html

License
=======
Expand All @@ -149,7 +149,7 @@ Discussion and Development

Discussions take place on our mailing list.

http://groups.google.com/group/pystatsmodels
http://groups.google.com/group/pystatsmodels

We are very interested in feedback about usability and suggestions for
improvements.
Expand All @@ -159,7 +159,7 @@ Bug Reports

Bug reports can be submitted to the issue tracker at

https://github.com/statsmodels/statsmodels/issues
https://github.com/statsmodels/statsmodels/issues

.. |Travis Build Status| image:: https://travis-ci.org/statsmodels/statsmodels.svg?branch=master
:target: https://travis-ci.org/statsmodels/statsmodels
Expand Down
72 changes: 50 additions & 22 deletions docs/source/release/version0.9.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,6 @@ https://github.com/statsmodels/statsmodels/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Am
the 0.8 release but not included in 0.8.)



The Highlights
--------------

Expand All @@ -43,15 +42,17 @@ The Highlights
- new count models: GeneralizedPoisson, zero inflated models
- Bayesian mixed GLM
- Gaussian Imputation
- new multivariate methods: factor analysis, MANOVA, repeated measures within ANOVA
- new multivariate methods: factor analysis, MANOVA, repeated measures
within ANOVA
- GLM var_weights in addition to freq_weights
- Holt-Winters and Exponential Smoothing


What's new - an overview
------------------------

The following lists the main new features of statsmodels 0.9. In addition, release 0.9 includes bug fixes, refactorings and improvements in many areas.
The following lists the main new features of statsmodels 0.9. In addition,
release 0.9 includes bug fixes, refactorings and improvements in many areas.

**base**
- distributed estimation #3396 (Leland Bybee GSOC, Kerby Shedden)
Expand All @@ -66,7 +67,8 @@ The following lists the main new features of statsmodels 0.9. In addition, relea
- zero-inflated count models #3755 merged in #3908

- discrete optimization improvements #3921, #3928 (Josef Perktold)
- extend discrete margin when extra params, NegativeBinomial #3811 (Josef Perktold)
- extend discrete margin when extra params, NegativeBinomial #3811
(Josef Perktold)

**duration**
- dependent censoring in survival/duration #3090 (Kerby Shedden)
Expand All @@ -80,39 +82,47 @@ The following lists the main new features of statsmodels 0.9. In addition, relea

**graphics**
- graphics HDR functional boxplot #3876 merged in #4049 (Pamphile ROY)
- graphics Bland-Altman or Tukey mean difference plot #4112 merged in #4200 (Joses W. Ho)
- graphics Bland-Altman or Tukey mean difference plot
#4112 merged in #4200 (Joses W. Ho)
- bandwidth options in violinplots #4510 (Jim Correia)

**imputation**
- multiple imputation via Gaussian model #4394, #4520 (Kerby Shedden)
- regularized fitting in MICE #4319 (Kerby Shedden)

**iolib**
- improvements of summary_coll #3702 merged #4064 (Natasha Watkins, Kevin Sheppard)
- improvements of summary_coll #3702 merged #4064 (Natasha Watkins,
Kevin Sheppard)

**multivariate**
- multivariate: MANOVA, CanCorr #3327 (Yichuan Liu)
- Factor Analysis #4161, #4156, #4167, #4214 (Yichuan Liu, Kerby Shedden, Josef Perktold)
- Factor Analysis #4161, #4156, #4167, #4214 (Yichuan Liu, Kerby Shedden,
Josef Perktold)
- statsmodels now includes the rotation code by ....

**regression**
- fit_regularized for WLS #3581 (Kerby Shedden)

**stats**
- Knockoff FDR # 3204 (Kerby Shedden)
- Repeated measures ANOVA #3303 merged in #3663, #3838 (Yichuan Liu, Richard Höchenberger)
- lilliefors test for exponential distribution #3837 merged in #3936 (Jacob Kimmel, Josef Perktold)
- Repeated measures ANOVA #3303 merged in #3663, #3838 (Yichuan Liu, Richard
Höchenberger)
- lilliefors test for exponential distribution #3837 merged in #3936 (Jacob
Kimmel, Josef Perktold)

**tools**
- quasi-random, Halton sequences #4104 (Pamphile ROY)

**tsa**
- VECM #3246 (Aleksandar Karakas GSOC, Josef Perktold)
- exog support in VAR, incomplete for extra results, part of VECM #3246, #4538 (Aleksandar Karakas GSOC, Josef Perktold)
- exog support in VAR, incomplete for extra results, part of VECM
#3246, #4538 (Aleksandar Karakas GSOC, Josef Perktold)
- Markov switching, Kim smoother #3141 (Chad Fulton)
- Holt-Winters #3817 merged in #4176 (tvanzyl)
- seasonal_decompose: trend extrapolation and vectorized 2-D #3031 (kernc, Josef Perktold)
- add frequency domain seasonal components to UnobservedComponents #4250 (Jordan Yoder)
- seasonal_decompose: trend extrapolation and vectorized 2-D #3031
(kernc, Josef Perktold)
- add frequency domain seasonal components to UnobservedComponents #4250
(Jordan Yoder)
- refactoring of date handling in tsa #3276, #4457 (Chad Fulton)
- SARIMAX without AR, MA #3383 (Chad Fulton)

Expand All @@ -124,22 +134,39 @@ The following lists the main new features of statsmodels 0.9. In addition, relea
`bug-wrong`
~~~~~~~~~~~

A new issue label `type-bug-wrong` indicates bugs that cause that incorrect numbers are returned without warnings.
(Regular bugs are mostly usability bugs or bugs that raise an exception for unsupported use cases.)
A new issue label `type-bug-wrong` indicates bugs that cause that incorrect
numbers are returned without warnings.
(Regular bugs are mostly usability bugs or bugs that raise an exception for
unsupported use cases.)
see https://github.com/statsmodels/statsmodels/issues?q=is%3Aissue+label%3Atype-bug-wrong+is%3Aclosed+milestone%3A0.9

- scale in GLM fit_constrained, #4193 fixed in #4195
cov_params and bse were incorrect if scale is estimated as in Gaussian. (This did not affect families with scale=1 such as Poisson)
cov_params and bse were incorrect if scale is estimated as in Gaussian.
(This did not affect families with scale=1 such as Poisson)
- incorrect `pearson_chi2` with binomial counts, #3612 fixed as part of #3692
- discrete predict with offset or exposure, #3569 fixed in 3696
If either offset or exposure are not None but exog is None, then offset and exposure arguments in predict were ignored.
- discrete margins had wrong dummy and count effect if constant is prepended, #3695 fixed in #3696
- null_deviance and llnull in GLMResults were wrong if exposure was used and
when offset was used with Binomial counts.
- GLM Binomial in the non-binary count case used incorrect endog in recreating
models which is
used by fit_regularized and fit_constrained #4599.
- GLM observed hessian was incorrectly computed if non-canonical link is used,
fixed in #4620
This fix improves convergence with gradient optimization and removes a usually
numerically small error in cov_params.
- discrete predict with offset or exposure, #3569 fixed in #3696
If either offset or exposure are not None but exog is None, then offset and
exposure arguments in predict were ignored.
- discrete margins had wrong dummy and count effect if constant is prepended,
#3695 fixed in #3696
- OLS outlier test, wrong index if order is True, #3971 fixed in #4385
- tsa coint ignored the autolag keyword, #3966 fixed in #4492
This is a backwards incompatible change in default, instead of fixed maxlag it defaults now to 'aic' lag selection. The default autolag is now the same as the adfuller default.
This is a backwards incompatible change in default, instead of fixed maxlag
it defaults now to 'aic' lag selection. The default autolag is now the same
as the adfuller default.
- wrong confidence interval in contingency table summary, #3822 fixed in #3830
This only affected the summary and not the corresponding attribute.
- incorrect results in summary_col if regressor_order is used, #3767 fixed in #4271
- incorrect results in summary_col if regressor_order is used,
#3767 fixed in #4271


Description of selected new feature
Expand Down Expand Up @@ -359,6 +386,7 @@ Thanks to all of the contributors for the 0.9 release (based on git log):
* Niels Wouda
* Pamphile ROY
* Peter Quackenbush
* Quentin Andre
* Richard Höchenberger
* Rob Klooster
* Roman Ring
Expand All @@ -376,5 +404,5 @@ Thanks to all of the contributors for the 0.9 release (based on git log):
* weizhongg
* zveryansky

These lists of names are automatically generated based on git log, and may not be
complete.
These lists of names are automatically generated based on git log, and may not
be complete.
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ def check_dependency_versions(min_versions):
MIN = 9
REV = 0
ISRELEASED = True
VERSION = '%d.%d.%drc1' % (MAJ,MIN,REV)
VERSION = '%d.%d.%d' % (MAJ,MIN,REV)

classifiers = ['Development Status :: 4 - Beta',
'Environment :: Console',
Expand Down
14 changes: 10 additions & 4 deletions statsmodels/base/elastic_net.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,9 +150,15 @@ def fit_elasticnet(model, method="coord_descent", maxiter=100,
btol = 1e-4
params_zero = np.zeros(len(params), dtype=bool)

init_args = dict([(k, getattr(model, k)) for k in model._init_keys
if k != "offset" and hasattr(model, k)])
init_args = model._get_init_kwds()
# we don't need a copy of init_args because get_init_kwds provides new dict
init_args['hasconst'] = False
model_offset = init_args.pop('offset', None)
if 'exposure' in init_args and init_args['exposure'] is not None:
if model_offset is None:
model_offset = np.log(init_args.pop('exposure'))
else:
model_offset += np.log(init_args.pop('exposure'))

fgh_list = [
_gen_npfuncs(k, L1_wt, alpha, loglike_kwds, score_kwds, hess_kwds)
Expand All @@ -176,8 +182,8 @@ def fit_elasticnet(model, method="coord_descent", maxiter=100,
params0 = params.copy()
params0[k] = 0
offset = np.dot(model.exog, params0)
if hasattr(model, "offset") and model.offset is not None:
offset += model.offset
if model_offset is not None:
offset += model_offset

# Create a one-variable model for optimization.
model_1var = model.__class__(
Expand Down
13 changes: 11 additions & 2 deletions statsmodels/base/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -1505,6 +1505,7 @@ def wald_test(self, r_matrix, cov_p=None, scale=1.0, invcov=None,

cparams = np.dot(r_matrix, self.params[:, None])
J = float(r_matrix.shape[0]) # number of restrictions

if q_matrix is None:
q_matrix = np.zeros(J)
else:
Expand All @@ -1521,7 +1522,15 @@ def wald_test(self, r_matrix, cov_p=None, scale=1.0, invcov=None,
raise ValueError("r_matrix performs f_test for using "
"dimensions that are asymptotically "
"non-normal")
invcov = np.linalg.inv(cov_p)
invcov = np.linalg.pinv(cov_p)
J_ = np.linalg.matrix_rank(cov_p)
if J_ < J:
import warnings
from statsmodels.tools.sm_exceptions import ValueWarning
warnings.warn('covariance of constraints does not have full '
'rank. The number of constraints is %d, but '
'rank is %d' % (J, J_), ValueWarning)
J = J_

if (hasattr(self, 'mle_settings') and
self.mle_settings['optimizer'] in ['l1', 'l1_cvxopt_cp']):
Expand All @@ -1533,7 +1542,7 @@ def wald_test(self, r_matrix, cov_p=None, scale=1.0, invcov=None,
if use_f:
F /= J
return ContrastResults(F=F, df_denom=df_resid,
df_num=invcov.shape[0])
df_num=J) #invcov.shape[0])
else:
return ContrastResults(chi2=F, df_denom=J, statistic=F,
distribution='chi2', distargs=(J,))
Expand Down
31 changes: 20 additions & 11 deletions statsmodels/discrete/tests/test_sandwich_cov.py
Original file line number Diff line number Diff line change
Expand Up @@ -479,13 +479,9 @@ def test_basic(self):
assert_equal(res1.cov_type, self.cov_type)
assert_equal(res2.cov_type, self.cov_type)

assert_allclose(res1.params, res2.params, rtol=1e-13)
# bug TODO res1.scale missing ? in Gaussian/OLS
assert_allclose(res1.bse, res2.bse, rtol=1e-13)
# if not self.cov_type == 'nonrobust':
# assert_allclose(res1.bse * res1.scale, res2.bse, rtol=1e-13)
# else:
# assert_allclose(res1.bse, res2.bse, rtol=1e-13)
rtol = getattr(res1, 'rtol', 1e-13)
assert_allclose(res1.params, res2.params, rtol=rtol)
assert_allclose(res1.bse, res2.bse, rtol=1e-10)


class TestGLMLogit(CheckDiscreteGLM):
Expand All @@ -502,19 +498,32 @@ def setup_class(cls):
cls.res2 = mod1.fit(cov_type='cluster', cov_kwds=dict(groups=group))


class T_estGLMProbit(CheckDiscreteGLM):
# invalid link. What's Probit as GLM?
class TestGLMProbit(CheckDiscreteGLM):

@classmethod
def setup_class(cls):
endog_bin = (endog > endog.mean()).astype(int)
cls.cov_type = 'cluster'

mod1 = GLM(endog_bin, exog, family=families.Gaussian(link=links.CDFLink()))
cls.res1 = mod1.fit(cov_type='cluster', cov_kwds=dict(groups=group))
mod1 = GLM(endog_bin, exog, family=families.Binomial(link=links.probit()))
cls.res1 = mod1.fit(method='newton',
cov_type='cluster', cov_kwds=dict(groups=group))

mod1 = smd.Probit(endog_bin, exog)
cls.res2 = mod1.fit(cov_type='cluster', cov_kwds=dict(groups=group))
cls.rtol = 1e-6

def test_score_hessian(self):
res1 = self.res1
res2 = self.res2
# Note scale is fixed at 1, so we don't need to fix it explicitly
score1 = res1.model.score(res1.params * 0.98)
score2 = res2.model.score(res1.params * 0.98)
assert_allclose(score1, score2, rtol=1e-13)

hess1 = res1.model.hessian(res1.params)
hess2 = res2.model.hessian(res1.params)
assert_allclose(hess1, hess2, rtol=1e-10)


class TestGLMGaussNonRobust(CheckDiscreteGLM):
Expand Down

0 comments on commit a888919

Please sign in to comment.