Merge pull request #4633 from josef-pkt/backport_0.9.0

Backport 0.9.0
statsmodels · May 14, 2018 · a888919 · a888919
2 parents 40e4f56 + b07f5be
commit a888919
Show file tree

Hide file tree

Showing 17 changed files with 348 additions and 100 deletions.
diff --git a/README.rst b/README.rst
@@ -13,15 +13,15 @@ Documentation
 
 The documentation for the latest release is at
 
-   http://www.statsmodels.org/stable/
+http://www.statsmodels.org/stable/
 
 The documentation for the development version is at
 
-   http://www.statsmodels.org/dev/
+http://www.statsmodels.org/dev/
 
 Recent improvements are highlighted in the release notes
 
-   http://www.statsmodels.org/stable/release/version0.9.html
+http://www.statsmodels.org/stable/release/version0.9.html
 
 Backups of documentation are available at http://statsmodels.github.io/stable/
 and http://statsmodels.github.io/dev/.
@@ -104,7 +104,7 @@ Main Features
 
 * Miscellaneous models
 * Sandbox: statsmodels contains a sandbox folder with code in various stages of
-  developement and testing which is not considered "production ready".   This covers
+  developement and testing which is not considered "production ready".  This covers
   among others
 
   - Generalized method of moments (GMM) estimators
@@ -117,27 +117,27 @@ How to get it
 =============
 The master branch on GitHub is the most up to date code
 
-    https://www.github.com/statsmodels/statsmodels
+https://www.github.com/statsmodels/statsmodels
 
 Source download of release tags are available on GitHub
 
-    https://github.com/statsmodels/statsmodels/tags
+https://github.com/statsmodels/statsmodels/tags
 
 Binaries and source distributions are available from PyPi
 
-    http://pypi.python.org/pypi/statsmodels/
+http://pypi.python.org/pypi/statsmodels/
 
 Binaries can be installed in Anaconda
 
-    conda install statsmodels
+conda install statsmodels
 
 
 Installing from sources
 =======================
 
 See INSTALL.txt for requirements or see the documentation
 
-    http://statsmodels.github.io/dev/install.html
+http://statsmodels.github.io/dev/install.html
 
 License
 =======
@@ -149,7 +149,7 @@ Discussion and Development
 
 Discussions take place on our mailing list.
 
-    http://groups.google.com/group/pystatsmodels
+http://groups.google.com/group/pystatsmodels
 
 We are very interested in feedback about usability and suggestions for
 improvements.
@@ -159,7 +159,7 @@ Bug Reports
 
 Bug reports can be submitted to the issue tracker at
 
-    https://github.com/statsmodels/statsmodels/issues
+https://github.com/statsmodels/statsmodels/issues
 
 .. |Travis Build Status| image:: https://travis-ci.org/statsmodels/statsmodels.svg?branch=master
    :target: https://travis-ci.org/statsmodels/statsmodels

diff --git a/docs/source/release/version0.9.rst b/docs/source/release/version0.9.rst
@@ -32,7 +32,6 @@ https://github.com/statsmodels/statsmodels/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Am
 the 0.8 release but not included in 0.8.)
 
 
-
 The Highlights
 --------------
 
@@ -43,15 +42,17 @@ The Highlights
   - new count models: GeneralizedPoisson, zero inflated models
  - Bayesian mixed GLM
  - Gaussian Imputation
- - new multivariate methods: factor analysis, MANOVA, repeated measures within ANOVA
+ - new multivariate methods: factor analysis, MANOVA, repeated measures
+   within ANOVA
  - GLM var_weights in addition to freq_weights
  - Holt-Winters and Exponential Smoothing
 
 
 What's new - an overview
 ------------------------
 
-The following lists the main new features of statsmodels 0.9. In addition, release 0.9 includes bug fixes, refactorings and improvements in many areas.
+The following lists the main new features of statsmodels 0.9. In addition,
+release 0.9 includes bug fixes, refactorings and improvements in many areas.
 
 **base**
  - distributed estimation #3396  (Leland Bybee GSOC, Kerby Shedden)
@@ -66,7 +67,8 @@ The following lists the main new features of statsmodels 0.9. In addition, relea
     - zero-inflated count models #3755 merged in #3908
 
  - discrete optimization improvements #3921, #3928 (Josef Perktold)
- - extend discrete margin when extra params, NegativeBinomial #3811 (Josef Perktold)
+ - extend discrete margin when extra params, NegativeBinomial #3811
+   (Josef Perktold)
 
 **duration**
  - dependent censoring in survival/duration #3090 (Kerby Shedden)
@@ -80,39 +82,47 @@ The following lists the main new features of statsmodels 0.9. In addition, relea
 
 **graphics**
  - graphics HDR functional boxplot #3876 merged in #4049 (Pamphile ROY)
- - graphics Bland-Altman or Tukey mean difference plot #4112 merged in #4200 (Joses W. Ho)
+ - graphics Bland-Altman or Tukey mean difference plot
+   #4112 merged in #4200 (Joses W. Ho)
  - bandwidth options in violinplots #4510 (Jim Correia)
 
 **imputation**
  - multiple imputation via Gaussian model #4394, #4520 (Kerby Shedden)
  - regularized fitting in MICE #4319 (Kerby Shedden)
 
 **iolib**
- - improvements of summary_coll #3702 merged #4064 (Natasha Watkins, Kevin Sheppard)
+ - improvements of summary_coll #3702 merged #4064 (Natasha Watkins,
+   Kevin Sheppard)
 
 **multivariate**
  - multivariate: MANOVA, CanCorr #3327 (Yichuan Liu)
- - Factor Analysis #4161, #4156, #4167, #4214 (Yichuan Liu, Kerby Shedden, Josef Perktold)
+ - Factor Analysis #4161, #4156, #4167, #4214 (Yichuan Liu, Kerby Shedden,
+   Josef Perktold)
  - statsmodels now includes the rotation code by ....
 
 **regression**
  - fit_regularized for WLS #3581 (Kerby Shedden)
 
 **stats**
  - Knockoff FDR # 3204 (Kerby Shedden)
- - Repeated measures ANOVA #3303 merged in #3663, #3838 (Yichuan Liu, Richard Höchenberger)
- - lilliefors test for exponential distribution #3837 merged in #3936 (Jacob Kimmel, Josef Perktold)
+ - Repeated measures ANOVA #3303 merged in #3663, #3838 (Yichuan Liu, Richard
+   Höchenberger)
+ - lilliefors test for exponential distribution #3837 merged in #3936 (Jacob
+   Kimmel, Josef Perktold)
 
 **tools**
  - quasi-random, Halton sequences #4104 (Pamphile ROY)
 
 **tsa**
  - VECM #3246 (Aleksandar Karakas GSOC, Josef Perktold)
- - exog support in VAR, incomplete for extra results, part of VECM #3246, #4538 (Aleksandar Karakas GSOC, Josef Perktold)
+ - exog support in VAR, incomplete for extra results, part of VECM
+   #3246, #4538 (Aleksandar Karakas GSOC, Josef Perktold)
  - Markov switching, Kim smoother #3141 (Chad Fulton)
  - Holt-Winters #3817 merged in #4176 (tvanzyl)
- - seasonal_decompose: trend extrapolation and vectorized 2-D #3031 (kernc, Josef Perktold)
- - add frequency domain seasonal components to UnobservedComponents #4250 (Jordan Yoder)
+ - seasonal_decompose: trend extrapolation and vectorized 2-D #3031
+   (kernc, Josef Perktold)
+ - add frequency domain seasonal components to UnobservedComponents #4250
+   (Jordan Yoder)
  - refactoring of date handling in tsa #3276, #4457 (Chad Fulton)
  - SARIMAX without AR, MA #3383  (Chad Fulton)
 
@@ -124,22 +134,39 @@ The following lists the main new features of statsmodels 0.9. In addition, relea
 `bug-wrong`
 ~~~~~~~~~~~
 
-A new issue label `type-bug-wrong` indicates bugs that cause that incorrect numbers are returned without warnings.
-(Regular bugs are mostly usability bugs or bugs that raise an exception for unsupported use cases.)
+A new issue label `type-bug-wrong` indicates bugs that cause that incorrect
+numbers are returned without warnings.
+(Regular bugs are mostly usability bugs or bugs that raise an exception for
+unsupported use cases.)
 see https://github.com/statsmodels/statsmodels/issues?q=is%3Aissue+label%3Atype-bug-wrong+is%3Aclosed+milestone%3A0.9
 
 - scale in GLM fit_constrained, #4193 fixed in #4195
-  cov_params and bse were incorrect if scale is estimated as in Gaussian. (This did not affect families with scale=1 such as Poisson)
+  cov_params and bse were incorrect if scale is estimated as in Gaussian.
+  (This did not affect families with scale=1 such as Poisson)
 - incorrect `pearson_chi2` with binomial counts, #3612 fixed as part of #3692
-- discrete predict with offset or exposure, #3569 fixed in 3696
-  If either offset or exposure are not None but exog is None, then offset and exposure arguments in predict were ignored.
-- discrete margins had wrong dummy and count effect if constant is prepended, #3695 fixed in #3696
+- null_deviance and llnull in GLMResults were wrong if exposure was used and
+  when offset was used with Binomial counts.
+- GLM Binomial in the non-binary count case used incorrect endog in recreating
+  models which is
+  used by fit_regularized and fit_constrained #4599.
+- GLM observed hessian was incorrectly computed if non-canonical link is used,
+  fixed in #4620
+  This fix improves convergence with gradient optimization and removes a usually
+  numerically small error in cov_params.
+- discrete predict with offset or exposure, #3569 fixed in #3696
+  If either offset or exposure are not None but exog is None, then offset and
+  exposure arguments in predict were ignored.
+- discrete margins had wrong dummy and count effect if constant is prepended,
+  #3695 fixed in #3696
 - OLS outlier test, wrong index if order is True, #3971 fixed in #4385
 - tsa coint ignored the autolag keyword, #3966 fixed in #4492
-  This is a backwards incompatible change in default, instead of fixed maxlag it defaults now to 'aic' lag selection. The default autolag is now the same as the adfuller default.
+  This is a backwards incompatible change in default, instead of fixed maxlag
+  it defaults now to 'aic' lag selection. The default autolag is now the same
+  as the adfuller default.
 - wrong confidence interval in contingency table summary, #3822 fixed in #3830
   This only affected the summary and not the corresponding attribute.
-- incorrect results in summary_col if regressor_order is used, #3767 fixed in #4271
+- incorrect results in summary_col if regressor_order is used,
+  #3767 fixed in #4271
 
 
 Description of selected new feature
@@ -359,6 +386,7 @@ Thanks to all of the contributors for the 0.9 release (based on git log):
 * Niels Wouda
 * Pamphile ROY
 * Peter Quackenbush
+* Quentin Andre
 * Richard Höchenberger
 * Rob Klooster
 * Roman Ring
@@ -376,5 +404,5 @@ Thanks to all of the contributors for the 0.9 release (based on git log):
 * weizhongg
 * zveryansky
 
-These lists of names are automatically generated based on git log, and may not be
-complete.
+These lists of names are automatically generated based on git log, and may not
+be complete.
diff --git a/setup.py b/setup.py
@@ -171,7 +171,7 @@ def check_dependency_versions(min_versions):
 MIN = 9
 REV = 0
 ISRELEASED = True
-VERSION = '%d.%d.%drc1' % (MAJ,MIN,REV)
+VERSION = '%d.%d.%d' % (MAJ,MIN,REV)
 
 classifiers = ['Development Status :: 4 - Beta',
                'Environment :: Console',

diff --git a/statsmodels/base/elastic_net.py b/statsmodels/base/elastic_net.py
@@ -150,9 +150,15 @@ def fit_elasticnet(model, method="coord_descent", maxiter=100,
     btol = 1e-4
     params_zero = np.zeros(len(params), dtype=bool)
 
-    init_args = dict([(k, getattr(model, k)) for k in model._init_keys
-                      if k != "offset" and hasattr(model, k)])
+    init_args = model._get_init_kwds()
+    # we don't need a copy of init_args because get_init_kwds provides new dict
     init_args['hasconst'] = False
+    model_offset = init_args.pop('offset', None)
+    if 'exposure' in init_args and init_args['exposure'] is not None:
+        if model_offset is None:
+            model_offset = np.log(init_args.pop('exposure'))
+        else:
+            model_offset += np.log(init_args.pop('exposure'))
 
     fgh_list = [
         _gen_npfuncs(k, L1_wt, alpha, loglike_kwds, score_kwds, hess_kwds)
@@ -176,8 +182,8 @@ def fit_elasticnet(model, method="coord_descent", maxiter=100,
             params0 = params.copy()
             params0[k] = 0
             offset = np.dot(model.exog, params0)
-            if hasattr(model, "offset") and model.offset is not None:
-                offset += model.offset
+            if model_offset is not None:
+                offset += model_offset
 
             # Create a one-variable model for optimization.
             model_1var = model.__class__(

diff --git a/statsmodels/base/model.py b/statsmodels/base/model.py
@@ -1505,6 +1505,7 @@ def wald_test(self, r_matrix, cov_p=None, scale=1.0, invcov=None,
 
         cparams = np.dot(r_matrix, self.params[:, None])
         J = float(r_matrix.shape[0])  # number of restrictions
+
         if q_matrix is None:
             q_matrix = np.zeros(J)
         else:
@@ -1521,7 +1522,15 @@ def wald_test(self, r_matrix, cov_p=None, scale=1.0, invcov=None,
                 raise ValueError("r_matrix performs f_test for using "
                                  "dimensions that are asymptotically "
                                  "non-normal")
-            invcov = np.linalg.inv(cov_p)
+            invcov = np.linalg.pinv(cov_p)
+            J_ = np.linalg.matrix_rank(cov_p)
+            if J_ < J:
+                import warnings
+                from statsmodels.tools.sm_exceptions import ValueWarning
+                warnings.warn('covariance of constraints does not have full '
+                              'rank. The number of constraints is %d, but '
+                              'rank is %d' % (J, J_), ValueWarning)
+                J = J_
 
         if (hasattr(self, 'mle_settings') and
                 self.mle_settings['optimizer'] in ['l1', 'l1_cvxopt_cp']):
@@ -1533,7 +1542,7 @@ def wald_test(self, r_matrix, cov_p=None, scale=1.0, invcov=None,
         if use_f:
             F /= J
             return ContrastResults(F=F, df_denom=df_resid,
-                                   df_num=invcov.shape[0])
+                                   df_num=J) #invcov.shape[0])
         else:
             return ContrastResults(chi2=F, df_denom=J, statistic=F,
                                    distribution='chi2', distargs=(J,))

diff --git a/statsmodels/discrete/tests/test_sandwich_cov.py b/statsmodels/discrete/tests/test_sandwich_cov.py
@@ -479,13 +479,9 @@ def test_basic(self):
         assert_equal(res1.cov_type, self.cov_type)
         assert_equal(res2.cov_type, self.cov_type)
 
-        assert_allclose(res1.params, res2.params, rtol=1e-13)
-        # bug TODO res1.scale missing ?  in Gaussian/OLS
-        assert_allclose(res1.bse, res2.bse, rtol=1e-13)
-#         if not self.cov_type == 'nonrobust':
-#             assert_allclose(res1.bse * res1.scale, res2.bse, rtol=1e-13)
-#         else:
-#             assert_allclose(res1.bse, res2.bse, rtol=1e-13)
+        rtol = getattr(res1, 'rtol', 1e-13)
+        assert_allclose(res1.params, res2.params, rtol=rtol)
+        assert_allclose(res1.bse, res2.bse, rtol=1e-10)
 
 
 class TestGLMLogit(CheckDiscreteGLM):
@@ -502,19 +498,32 @@ def setup_class(cls):
         cls.res2 = mod1.fit(cov_type='cluster', cov_kwds=dict(groups=group))
 
 
-class T_estGLMProbit(CheckDiscreteGLM):
-    # invalid link. What's Probit as GLM?
+class TestGLMProbit(CheckDiscreteGLM):
 
     @classmethod
     def setup_class(cls):
         endog_bin = (endog > endog.mean()).astype(int)
         cls.cov_type = 'cluster'
 
-        mod1 = GLM(endog_bin, exog, family=families.Gaussian(link=links.CDFLink()))
-        cls.res1 = mod1.fit(cov_type='cluster', cov_kwds=dict(groups=group))
+        mod1 = GLM(endog_bin, exog, family=families.Binomial(link=links.probit()))
+        cls.res1 = mod1.fit(method='newton',
+                            cov_type='cluster', cov_kwds=dict(groups=group))
 
         mod1 = smd.Probit(endog_bin, exog)
         cls.res2 = mod1.fit(cov_type='cluster', cov_kwds=dict(groups=group))
+        cls.rtol = 1e-6
+
+    def test_score_hessian(self):
+        res1 = self.res1
+        res2 = self.res2
+        # Note scale is fixed at 1, so we don't need to fix it explicitly
+        score1 = res1.model.score(res1.params * 0.98)
+        score2 = res2.model.score(res1.params * 0.98)
+        assert_allclose(score1, score2, rtol=1e-13)
+
+        hess1 = res1.model.hessian(res1.params)
+        hess2 = res2.model.hessian(res1.params)
+        assert_allclose(hess1, hess2, rtol=1e-10)
 
 
 class TestGLMGaussNonRobust(CheckDiscreteGLM):