adding the TR edits to Ch 20

DS-100 · May 4, 2023 · ee0a242 · ee0a242
1 parent 1d2fc8e
commit ee0a242
Show file tree

Hide file tree

Showing 3 changed files with 28 additions and 2 deletions.
diff --git a/content/ch/20/gd_alternative.ipynb b/content/ch/20/gd_alternative.ipynb
@@ -95,7 +95,9 @@
     "\n",
     "As with stochastic gradient descent, we perform mini-batch gradient descent by randomly shuffling the data. Then we split the data into consecutive mini-batches, and iterate through the batches in sequence. After each epoch, we re-shuffle our data and select new mini-batches.\n",
     "\n",
-    "While we have made the distinction between stochastic and mini-batch gradient descent, stochastic gradient descent is sometimes used as an umbrella term that encompasses the selection of a mini-batch of any size. "
+    "While we have made the distinction between stochastic and mini-batch gradient descent, stochastic gradient descent is sometimes used as an umbrella term that encompasses the selection of a mini-batch of any size. \n",
+    "\n",
+    "Another common optimization technique is Newton's method. "
    ]
   },
   {

diff --git a/content/ch/20/gd_example.ipynb b/content/ch/20/gd_example.ipynb
@@ -43,8 +43,18 @@
     "    - \\gamma \\cdot \\text{sign} (y_i - \\theta) & \\text{otherwise}\n",
     "\\end{cases}\n",
     "$$\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    ":::{note}\n",
     "\n",
-    "(Note that in previous definitions of Huber loss we used the variable $ \\alpha $ to denote the transition point. To avoid confusion with the $ \\alpha $ used as the learning rate in gradient descent, we replace the transition point parameter of the Huber loss with $ \\gamma $.) "
+    "Note that in previous definitions of Huber loss we used the variable $ \\alpha $ to denote the transition point. To avoid confusion with the $ \\alpha $ used as the learning rate in gradient descent, we replace the transition point parameter of the Huber loss with $ \\gamma $. \n",
+    "\n",
+    ":::"
    ]
   },
   {

diff --git a/content/ch/20/gd_summary.ipynb b/content/ch/20/gd_summary.ipynb
@@ -45,6 +45,20 @@
    "source": [
     "Lastly, another option is to set the step-size adaptively. Additionally, setting different learning rates for different features can be important if they are of different scale or vary in frequency. For example, word counts can differ a lot across common words and rare words."
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The logistic regression model introduced in {numref}`Chapter %s <ch:logistic>` is fitted using numerical optimization methods like those described in this chapter. We wrap up with one final case study that uses logistic regression to fit a complex model with thousands of features.   "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
   }
  ],
  "metadata": {