Skip to content

Commit

Permalink
adding the TR edits to Ch 20
Browse files Browse the repository at this point in the history
  • Loading branch information
debnolan committed May 4, 2023
1 parent 1d2fc8e commit ee0a242
Show file tree
Hide file tree
Showing 3 changed files with 28 additions and 2 deletions.
4 changes: 3 additions & 1 deletion content/ch/20/gd_alternative.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,9 @@
"\n",
"As with stochastic gradient descent, we perform mini-batch gradient descent by randomly shuffling the data. Then we split the data into consecutive mini-batches, and iterate through the batches in sequence. After each epoch, we re-shuffle our data and select new mini-batches.\n",
"\n",
"While we have made the distinction between stochastic and mini-batch gradient descent, stochastic gradient descent is sometimes used as an umbrella term that encompasses the selection of a mini-batch of any size. "
"While we have made the distinction between stochastic and mini-batch gradient descent, stochastic gradient descent is sometimes used as an umbrella term that encompasses the selection of a mini-batch of any size. \n",
"\n",
"Another common optimization technique is Newton's method. "
]
},
{
Expand Down
12 changes: 11 additions & 1 deletion content/ch/20/gd_example.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,18 @@
" - \\gamma \\cdot \\text{sign} (y_i - \\theta) & \\text{otherwise}\n",
"\\end{cases}\n",
"$$\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
":::{note}\n",
"\n",
"(Note that in previous definitions of Huber loss we used the variable $ \\alpha $ to denote the transition point. To avoid confusion with the $ \\alpha $ used as the learning rate in gradient descent, we replace the transition point parameter of the Huber loss with $ \\gamma $.) "
"Note that in previous definitions of Huber loss we used the variable $ \\alpha $ to denote the transition point. To avoid confusion with the $ \\alpha $ used as the learning rate in gradient descent, we replace the transition point parameter of the Huber loss with $ \\gamma $. \n",
"\n",
":::"
]
},
{
Expand Down
14 changes: 14 additions & 0 deletions content/ch/20/gd_summary.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,20 @@
"source": [
"Lastly, another option is to set the step-size adaptively. Additionally, setting different learning rates for different features can be important if they are of different scale or vary in frequency. For example, word counts can differ a lot across common words and rare words."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The logistic regression model introduced in {numref}`Chapter %s <ch:logistic>` is fitted using numerical optimization methods like those described in this chapter. We wrap up with one final case study that uses logistic regression to fit a complex model with thousands of features. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand Down

0 comments on commit ee0a242

Please sign in to comment.