[IDEA] Chapter 2, Add code demonstrating HalvingRandomSearchCV #106

eranr · 2023-11-08T20:45:23Z

Notebook name: 02_end_to_end_machine_learning_project
Section 5.2 “Randomized Search”
Cell 152

This is the first cell in the section, and it contains only the HalvingRandomSearchCV import. It seems like the cell is out of order and should contain actual code using the HalvingRandomSearchCV class.
How about adding the following two cells after the RandomizedSearchCV cells:

from sklearn.experimental import enable_halving_search_cv
from sklearn.model_selection import HalvingRandomSearchCV
param_distribs = {'preprocessing__geo__n_clusters': randint(low=3, high=50),
                  'random_forest__max_features': randint(low=2, high=20)}

h_rnd_search = HalvingRandomSearchCV(
    full_pipeline, param_distributions=param_distribs, cv=3,
    scoring='neg_root_mean_squared_error', random_state=42)

h_rnd_search.fit(housing, housing_labels)

cv_res = pd.DataFrame(h_rnd_search.cv_results_).dropna()
cv_res.sort_values(by="mean_test_score", ascending=False, inplace=True)
cv_res = cv_res[["param_preprocessing__geo__n_clusters",
                 "param_random_forest__max_features", "split0_test_score",
                 "split1_test_score", "split2_test_score", "mean_test_score"]]
cv_res.columns = ["n_clusters", "max_features"] + score_cols
cv_res[score_cols] = -cv_res[score_cols].round().astype(np.int64)
cv_res.head()

A couple of notes:

The execution of the first cell would generate a lot of warnings as e.g. the resource savings in the form of reducing the training set may not be adequate for the candidate being tested. One example I ran into was inside the KMeans fit function, where the number of clusters exceeded the number of training set data points.
The second cell is identical to previous cells that display the search results, only with the “dropna()” at the end. Whenever there is an error trying to fit a candidate as described above, the associated score appearing in the results are “nan”, leading the attempt to round the numerical results to fail.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[IDEA] Chapter 2, Add code demonstrating HalvingRandomSearchCV #106

[IDEA] Chapter 2, Add code demonstrating HalvingRandomSearchCV #106

eranr commented Nov 8, 2023

[IDEA] Chapter 2, Add code demonstrating HalvingRandomSearchCV #106

[IDEA] Chapter 2, Add code demonstrating HalvingRandomSearchCV #106

Comments

eranr commented Nov 8, 2023