Skip to content

Commit

Permalink
TR and MP edits to ch 16
Browse files Browse the repository at this point in the history
  • Loading branch information
debnolan committed May 8, 2023
1 parent 7c8a383 commit f4e53d5
Show file tree
Hide file tree
Showing 6 changed files with 169 additions and 126 deletions.
64 changes: 39 additions & 25 deletions content/ch/16/ms_cv.ipynb

Large diffs are not rendered by default.

48 changes: 24 additions & 24 deletions content/ch/16/ms_overfitting.ipynb

Large diffs are not rendered by default.

58 changes: 35 additions & 23 deletions content/ch/16/ms_regularization.ipynb

Large diffs are not rendered by default.

63 changes: 31 additions & 32 deletions content/ch/16/ms_risk.ipynb

Large diffs are not rendered by default.

15 changes: 13 additions & 2 deletions content/ch/16/ms_summary.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In this chapter, we have seen problems arise when we minimize mean square error to both fit a model and evaluate it. The train-test split helps us get around this problem, where we fit a model with the train set and evaluate our fitted model on test data that have been set aside.\n",
"In this chapter, we have seen problems arise when we minimize mean square error to both fit a model and evaluate it. The train-test split helps us get around this problem, where we fit a model with the train set and evaluate our fitted model on test data that have been set aside. \n",
"\n",
"It's important to not \"over use\" the test set so we keep it separate until we have committed to a model. To help us commit, we might use cross-validation, which imitates the division of data into test and train sets. Again, it's important to cross-validate using only the training set and keep the original test set away from any model selection process. \n",
"\n",
"Regularization takes a different approach and penalizes the mean square error to keep the model from fitting the data too closely. In regularization, we use all of the data available to fit the model, but shrink the size of the coefficients. "
]
Expand All @@ -54,8 +56,17 @@
"\n",
"Creating more features, whether useful or not, typically increases model variance. Models with many parameters have many possible combinations of parameters and therefore have higher variance than models with few parameters. On the other hand, adding a useful feature to the model, such as a quadratic feature when the underlying process is quadratic, reduces bias. But, even adding a useless feature rarely increases bias.\n",
"\n",
"Being aware of the bias-variance trade off can help you do a better job fitting models. And using techniques like the train-test split, cross-validation, and regularization can ameliorate this issue."
"Being aware of the bias-variance trade off can help you do a better job fitting models. And using techniques like the train-test split, cross-validation, and regularization can ameliorate this issue.\n",
"\n",
"Another part of modeling considers the variation in the fitted coefficients and curve. We might want to provide a confidence interval for a coefficient or a prediction band for a future observation. These intervals and bands give a sense of the accuracy of the fitted model. We discuss this notion next. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand Down
47 changes: 27 additions & 20 deletions content/ch/16/ms_train_test.ipynb

Large diffs are not rendered by default.

0 comments on commit f4e53d5

Please sign in to comment.