Skip to content

Commit

Permalink
incorporating TR comments into Ch 15
Browse files Browse the repository at this point in the history
  • Loading branch information
debnolan committed May 9, 2023
1 parent f4e53d5 commit e7e5d0d
Show file tree
Hide file tree
Showing 9 changed files with 108 additions and 77 deletions.
Binary file modified content/ch/15/figures/scatterplotSLR.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
46 changes: 26 additions & 20 deletions content/ch/15/linear_case.ipynb

Large diffs are not rendered by default.

41 changes: 24 additions & 17 deletions content/ch/15/linear_categorical.ipynb

Large diffs are not rendered by default.

24 changes: 12 additions & 12 deletions content/ch/15/linear_feature_eng.ipynb

Large diffs are not rendered by default.

21 changes: 14 additions & 7 deletions content/ch/15/linear_multi.ipynb

Large diffs are not rendered by default.

6 changes: 3 additions & 3 deletions content/ch/15/linear_multi_fit.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@
"\n",
"$$\n",
"\\begin{aligned}\n",
"y_2 \\approx \\theta_0 + \\theta_1 x_{2,1} + \\ldots + \\theta_p x_{2,p} \n",
"y_2 \\approx \\theta_0 + \\theta_1 x_{2,1} + \\ldots + \\theta_p x_{2,p}. \n",
"\\end{aligned}\n",
"$$\n",
"\n",
Expand All @@ -88,7 +88,7 @@
"Putting these notational definitions together, we can write the vector of predictions for the entire dataset using matrix multiplication:\n",
"\n",
"$$\n",
"{\\textbf{X}} {\\boldsymbol{\\theta}}\n",
"{\\textbf{X}} {\\boldsymbol{\\theta}.}\n",
"$$\n",
"\n",
"If we check the dimensions of $\\textbf{X}$ and $\\boldsymbol{\\theta}$, we can confirm that ${\\textbf{X}} {\\boldsymbol{\\theta}}$ is an $n$-dimensional column vector.\n",
Expand Down Expand Up @@ -122,7 +122,7 @@
"source": [
"$$\n",
"\\frac{1}{n} \\sum_i [y_i - (\\theta_0 + \\theta_1 x_{i,1} + \\cdots + \\theta_p x_{i,p})]^2 \n",
"= \\frac{1}{n} \\lVert \\mathbf{y} - {\\textbf{X}} {\\boldsymbol{\\theta}} \\rVert^2\n",
"= \\frac{1}{n} \\lVert \\mathbf{y} - {\\textbf{X}} {\\boldsymbol{\\theta}} \\rVert^2.\n",
"$$"
]
},
Expand Down
34 changes: 19 additions & 15 deletions content/ch/15/linear_pa.ipynb

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions content/ch/15/linear_simple_fit.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@
"metadata": {},
"source": [
"These equations are called the *normal equations*. \n",
"In the first equation, we see that $\\hat{\\theta}_0$ can be represented as a function of $\\hat{\\theta}_1$.\n",
"In the first equation, we see that $\\hat{\\theta}_0$ can be represented as a function of $\\hat{\\theta}_1$,\n",
"\n",
"$$\n",
"\\hat{\\theta}_0 = \\bar{y} - \\hat{\\theta}_1 \\bar{x}.\n",
Expand Down Expand Up @@ -154,7 +154,7 @@
"We have derived the equation for the least squares line that we used in the previous section. There, we used the `pandas` built-in methods to compute\n",
"$SD(\\mathbf{x})$, $SD(\\mathbf{y})$, and $r(\\mathbf{x}, \\mathbf{y})$,\n",
"to easily calculate the equation for this line.\n",
"However, in practice we recommend using the functionality provided in `scikit-learn` to do the model fitting."
"However, in practice we recommend using the functionality provided in `scikit-learn` to do the model fitting:"
]
},
{
Expand Down
9 changes: 8 additions & 1 deletion content/ch/15/linear_summary.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -59,11 +59,18 @@
"+ Several models can be equally effective in predicting/explaining the response variable\n",
"\n",
"If we are concerned with making inferences, where we want to interpret/understand the model, then we should err on the side of simpler models. \n",
"On the other hand, if our primary concern is the predictive ability of a model, then we tend not to concern ourselves with the number of coefficients and their interpretation. \n",
"On the other hand, if our primary concern is the predictive ability of a model, then we tend not to concern ourselves with the number of coefficients and their interpretation. But, this \"black box\" approach can lead to models that, say, overly depend on anomalous values in the data or models that are inadequate in other ways. So be careful with the black box approach, especially when the predictions may be harmful to people. \n",
"\n",
"In this chapter, we have used linear models in a descriptive way. We introduced a few notions for deciding when to include a feature in a model by examining residuals for patterns, comparing the size of standard errors and the change in the multiple $R^2$. Often times we settled for a \n",
"simpler model that was easier to interpret. In the next chapter, we look at other more formal tools for choosing the features to include in a model. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand Down

0 comments on commit e7e5d0d

Please sign in to comment.