TR and MP edits to ch 16

DS-100 · May 8, 2023 · f4e53d5 · f4e53d5
1 parent 7c8a383
commit f4e53d5
Show file tree

Hide file tree

Showing 6 changed files with 169 additions and 126 deletions.
diff --git a/content/ch/16/ms_cv.ipynb b/content/ch/16/ms_cv.ipynb
diff --git a/content/ch/16/ms_overfitting.ipynb b/content/ch/16/ms_overfitting.ipynb
diff --git a/content/ch/16/ms_regularization.ipynb b/content/ch/16/ms_regularization.ipynb
diff --git a/content/ch/16/ms_risk.ipynb b/content/ch/16/ms_risk.ipynb
diff --git a/content/ch/16/ms_summary.ipynb b/content/ch/16/ms_summary.ipynb
@@ -29,7 +29,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "In this chapter, we have seen problems arise when we minimize mean square error to both fit a model and evaluate it.  The train-test split helps us get around this problem, where we fit a model with the train set and evaluate our fitted model on test data that have been set aside.\n",
+    "In this chapter, we have seen problems arise when we minimize mean square error to both fit a model and evaluate it.  The train-test split helps us get around this problem, where we fit a model with the train set and evaluate our fitted model on test data that have been set aside. \n",
+    "\n",
+    "It's important to not \"over use\" the test set so we keep it separate until we have committed to a model. To help us commit, we might use cross-validation, which imitates the division of data into test and train sets. Again, it's important to cross-validate using only the training set and keep the original test set away from any model selection process.  \n",
     "\n",
     "Regularization takes a different approach and penalizes the mean square error to keep the model from fitting the data too closely. In regularization, we use all of the data available to fit the model, but shrink the size of the coefficients. "
    ]
@@ -54,8 +56,17 @@
     "\n",
     "Creating more features, whether useful or not, typically increases model variance.  Models with many parameters have many possible combinations of parameters and therefore have higher variance than models with few parameters. On the other hand, adding a useful feature to the model, such as a quadratic feature when the underlying process is quadratic, reduces bias. But, even adding a useless feature rarely increases bias.\n",
     "\n",
-    "Being aware of the bias-variance trade off can help you do a better job fitting models. And using techniques like the train-test split, cross-validation, and regularization can ameliorate this issue."
+    "Being aware of the bias-variance trade off can help you do a better job fitting models. And using techniques like the train-test split, cross-validation, and regularization can ameliorate this issue.\n",
+    "\n",
+    "Another part of modeling considers the variation in the fitted coefficients and curve. We might want to provide a confidence interval for a coefficient or a prediction band for a future observation. These intervals and bands give a sense of the accuracy of the fitted model. We discuss this notion next.  "
    ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
   }
  ],
  "metadata": {

diff --git a/content/ch/16/ms_train_test.ipynb b/content/ch/16/ms_train_test.ipynb