DOC: Add explanations for Durbin-Watson and Kurtosis. Include DW-test…

… and Breusch-Godfrey test for autocorrelation. Reorder some parts for better readability
statsmodels · Apr 23, 2024 · adc9553 · adc9553
1 parent c22837f
commit adc9553
Showing 1 changed file with 75 additions and 31 deletions.
diff --git a/examples/notebooks/regression_diagnostics.ipynb b/examples/notebooks/regression_diagnostics.ipynb
@@ -4,23 +4,23 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Regression diagnostics"
+    "# Regression diagnostics\n"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "This example file shows how to use a few of the ``statsmodels`` regression diagnostic tests in a real-life context. You can learn about more tests and find out more information about the tests here on the [Regression Diagnostics page.](https://www.statsmodels.org/stable/diagnostic.html)\n",
+    "This example file shows how to use a few of the `statsmodels` regression diagnostic tests in a real-life context. You can learn about more tests and find out more information about the tests here on the [Regression Diagnostics page.](https://www.statsmodels.org/stable/diagnostic.html)\n",
     "\n",
-    "Note that most of the tests described here only return a tuple of numbers, without any annotation. A full description of outputs is always included in the docstring and in the online ``statsmodels`` documentation. For presentation purposes, we use the ``zip(name,test)`` construct to pretty-print short descriptions in the examples below."
+    "Note that most of the tests described here only return a tuple of numbers, without any annotation. A full description of outputs is always included in the docstring and in the online `statsmodels` documentation. For presentation purposes, we use the `zip(name,test)` construct to pretty-print short descriptions in the examples below.\n"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Estimate a regression model"
+    "## Estimate a regression model\n"
    ]
   },
   {
@@ -61,14 +61,14 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Normality of the residuals"
+    "## Normality of the residuals\n"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Jarque-Bera test:"
+    "Omnibus test:\n"
    ]
   },
   {
@@ -77,16 +77,18 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "name = [\"Jarque-Bera\", \"Chi^2 two-tail prob.\", \"Skew\", \"Kurtosis\"]\n",
-    "test = sms.jarque_bera(results.resid)\n",
+    "name = [\"Chi^2\", \"Two-tail probability\"]\n",
+    "test = sms.omni_normtest(results.resid)\n",
     "lzip(name, test)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Omni test:"
+    "Jarque-Bera test:\n",
+    "\n",
+    "Kurtosis below is the sample kurtosis, not the excess kurtosis. A sample from the normal distribution has kurtosis equal to 3.\n"
    ]
   },
   {
@@ -95,18 +97,18 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "name = [\"Chi^2\", \"Two-tail probability\"]\n",
-    "test = sms.omni_normtest(results.resid)\n",
+    "name = [\"Jarque-Bera test\", \"Chi^2 two-tail prob.\", \"Skew\", \"Kurtosis\"]\n",
+    "test = sms.jarque_bera(results.resid)\n",
     "lzip(name, test)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Influence tests\n",
+    "## Multicollinearity\n",
     "\n",
-    "Once created, an object of class ``OLSInfluence`` holds attributes and methods that allow users to assess the influence of each observation. For example, we can compute and extract the first few rows of DFbetas by:"
+    "Condition number:\n"
    ]
   },
   {
@@ -115,19 +117,20 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from statsmodels.stats.outliers_influence import OLSInfluence\n",
-    "\n",
-    "test_class = OLSInfluence(results)\n",
-    "test_class.dfbetas[:5, :]"
+    "name = [\"Conditon Number\"]\n",
+    "test = [np.linalg.cond(results.model.exog)]\n",
+    "lzip(name, test)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Explore other options by typing ``dir(influence_test)``\n",
+    "## Autorelation\n",
+    "\n",
+    "Durbin-Watson test:\n",
     "\n",
-    "Useful information on leverage can also be plotted:"
+    "DW statistic always ranges from 0 to 4. The closer to 2, the less autocorrelation is in the sample.\n"
    ]
   },
   {
@@ -136,26 +139,57 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from statsmodels.graphics.regressionplots import plot_leverage_resid2\n",
-    "\n",
-    "fig, ax = plt.subplots(figsize=(8, 6))\n",
-    "fig = plot_leverage_resid2(results, ax=ax)"
+    "name = [\"Durbin-Watson statistic\"]\n",
+    "test = [sms.durbin_watson(results.resid)]\n",
+    "lzip(name, test)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Other plotting options can be found on the [Graphics page.](https://www.statsmodels.org/stable/graphics.html)"
+    "Breusch–Godfrey test for serial correlation:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "name = [\"Breusch-Pagan Lagrange multiplier test statistic\", \"p-value\", \"f-value\", \"f p-value\"]\n",
+    "test = sms.acorr_breusch_godfrey(results)\n",
+    "lzip(name, test)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Multicollinearity\n",
+    "## Influence tests\n",
+    "\n",
+    "Once created, an object of class `OLSInfluence` holds attributes and methods that allow users to assess the influence of each observation. For example, we can compute and extract the first few rows of DFbetas by:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from statsmodels.stats.outliers_influence import OLSInfluence\n",
+    "\n",
+    "test_class = OLSInfluence(results)\n",
+    "test_class.dfbetas[:5, :]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Explore other options by typing `dir(influence_test)`\n",
     "\n",
-    "Condition number:"
+    "Useful information on leverage can also be plotted:\n"
    ]
   },
   {
@@ -164,7 +198,17 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "np.linalg.cond(results.model.exog)"
+    "from statsmodels.graphics.regressionplots import plot_leverage_resid2\n",
+    "\n",
+    "fig, ax = plt.subplots(figsize=(8, 6))\n",
+    "fig = plot_leverage_resid2(results, ax=ax)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Other plotting options can be found on the [Graphics page.](https://www.statsmodels.org/stable/graphics.html)\n"
    ]
   },
   {
@@ -173,7 +217,7 @@
    "source": [
     "## Heteroskedasticity tests\n",
     "\n",
-    "Breush-Pagan test:"
+    "Breush-Pagan test:\n"
    ]
   },
   {
@@ -191,7 +235,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Goldfeld-Quandt test"
+    "Goldfeld-Quandt test\n"
    ]
   },
   {
@@ -211,7 +255,7 @@
    "source": [
     "## Linearity\n",
     "\n",
-    "Harvey-Collier multiplier test for Null hypothesis that the linear specification is correct:"
+    "Harvey-Collier multiplier test for Null hypothesis that the linear specification is correct:\n"
    ]
   },
   {
@@ -242,7 +286,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.10"
+   "version": "3.10.13"
   }
  },
  "nbformat": 4,