[FAKE] GMM IC PR for comment #43

bdpedigo · 2023-05-30T14:32:34Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

bdpedigo · 2023-05-30T14:34:13Z

sklearn/mixture/_gaussian_mixture_ic.py

+    Different combinations of initialization, GMM,
+    and cluster numbers are used and the clustering
+    with the best selection criterion (BIC or AIC) is chosen.


suggest making this match LassoLarsIC a bit closer, eg "Such criteria are useful to select the value of the regularization parameter by making a trade-off between the goodness of fit and the complexity of the model." could basically replace "regularization parameter" with "gaussian mixture parameters"

bdpedigo · 2023-05-30T14:35:13Z

sklearn/mixture/_gaussian_mixture_ic.py

+    n_init : int, optional (default = 1)
+        If ``n_init`` is larger than 1, additional
+        ``n_init``-1 runs of :class:`sklearn.mixture.GaussianMixture`
+        initialized with k-means will be performed


not necessarily initialized with k-means, right?

bdpedigo · 2023-05-30T14:35:41Z

sklearn/mixture/_gaussian_mixture_ic.py

+        initialized with k-means will be performed
+        for all covariance parameters in ``covariance_type``.
+
+    init_params : {‘kmeans’ (default), ‘k-means++’, ‘random’, ‘random_from_data’}


perhaps worth explaining the options, mainly i dont know what random_from_data is from this description

also, is kmeans ++ not the default? if not, why not? i think it is in sklearn if i remember correctly

yeah, not sure, apparently kmeans is the default in GaussianMixture

bdpedigo · 2023-05-30T14:37:30Z

sklearn/mixture/_gaussian_mixture_ic.py

+
+    Attributes
+    ----------
+    best_criterion_ : float


lasso lars IC calls this "criterion_"

bdpedigo · 2023-05-30T14:38:59Z

sklearn/mixture/_gaussian_mixture_ic.py

+    covariance_type_ : str
+        Covariance type for the model with the best bic/aic.
+
+    best_model_ : :class:`sklearn.mixture.GaussianMixture`


in lassolarsIC, there is no "sub-object" with the best model; rather the whole class just operates as if it is that model. does that make sense? while i cant speak for them, my guess is this is closer to what they'd be expecting

I add the attributes like weights_, means_ from GaussianMixture into GaussianMixtureIC, but I found that I still need to save the best model (I call best_estimator_ in the newest version) in order to all predict. Did I understand you correctly?

bdpedigo · 2023-05-30T14:39:35Z

sklearn/mixture/_gaussian_mixture_ic.py

+    best_model_ : :class:`sklearn.mixture.GaussianMixture`
+        Object with the best bic/aic.
+
+    labels_ : array-like, shape (n_samples,)


not a property of GaussianMixture, recommend not storing

bdpedigo · 2023-05-30T14:40:50Z

sklearn/mixture/_gaussian_mixture_ic.py

+        self.criterion = criterion
+        self.n_jobs = n_jobs
+
+    def _check_multi_comp_inputs(self, input, name, default):


i usually make any methods that dont access self into functions

bdpedigo · 2023-05-30T14:41:55Z

sklearn/mixture/_gaussian_mixture_ic.py

+            name="min_components",
+            target_type=int,
+        )
+        check_scalar(


min value could be "min_components"?

bdpedigo · 2023-05-30T14:42:54Z

sklearn/mixture/_gaussian_mixture_ic.py

+        else:
+            criterion_value = model.aic(X)
+
+        # change the precision of "criterion_value" based on sample size


could you explain this?

bdpedigo · 2023-05-30T14:45:46Z

sklearn/mixture/_gaussian_mixture_ic.py

+        )
+        best_criter = [result.criterion for result in results]
+
+        if sum(best_criter == np.min(best_criter)) == 1:


this all seems fine but just a suggestion - https://numpy.org/doc/stable/reference/generated/numpy.argmin.html
docs imply that for ties, argmin gives the first. so in other words if results are sorted in order of complexity, just using argmin would do what you want. (can even leave a comment to this effect, if you go this route).

note that i think having the results sorted by complexity anyway is probably desireable?

bdpedigo · 2023-05-30T14:47:34Z

sklearn/mixture/_gaussian_mixture_ic.py

+
+
+
+class _CollectResults:


this is effectively a dictionary - recommend just using one, or a named tuple? i am just anti classes that only store data and dont have any methods, but that is just my style :)

bdpedigo · 2023-05-30T14:51:45Z

sklearn/mixture/_gaussian_mixture_ic.py

+        param_grid = dict(
+            covariance_type=covariance_type,
+            n_components=range(self.min_components, self.max_components + 1),
+        )
+        param_grid = list(ParameterGrid(param_grid))
+
+        seeds = random_state.randint(np.iinfo(np.int32).max, size=len(param_grid))
+
+        if parse_version(joblib.__version__) < parse_version("0.12"):
+            parallel_kwargs = {"backend": "threading"}
+        else:
+            parallel_kwargs = {"prefer": "threads"}
+
+        results = Parallel(n_jobs=self.n_jobs, verbose=self.verbose, **parallel_kwargs)(
+            delayed(self._fit_cluster)(X, gm_params, seed)
+            for gm_params, seed in zip(param_grid, seeds)
+        )
+        best_criter = [result.criterion for result in results]


why not just use GridSearchCV as in their example? https://scikit-learn.org/stable/auto_examples/mixture/plot_gmm_selection.html#sphx-glr-auto-examples-mixture-plot-gmm-selection-py

it would abstract away some of the stuff you have to do to make parallel work, for instance

github-actions · 2023-06-21T16:51:49Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 6b92c5a. Link to the linter CI: here}

add basic gmIC

45a75c1

bdpedigo commented May 30, 2023

View reviewed changes

update code

87cc2e8

tingshanL and others added 8 commits June 29, 2023 18:49

Merge branch 'main' into gmIC

1a731e7

Merge branch 'scikit-learn:main' into gmIC

91d8b3e

fix linting

e3b98f7

fix linting

a37949e

fix tests

a6ee201

Update _gaussian_mixture_ic.py

ebb86fe

Merge branch 'main' into gmIC

7b27ff2

Merge branch 'main' into gmIC

6b92c5a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FAKE] GMM IC PR for comment #43

[FAKE] GMM IC PR for comment #43

bdpedigo commented May 30, 2023

bdpedigo May 30, 2023

bdpedigo May 30, 2023

bdpedigo May 30, 2023

bdpedigo May 30, 2023

tingshanL Jun 21, 2023

bdpedigo May 30, 2023

bdpedigo May 30, 2023

tingshanL Jun 21, 2023

bdpedigo May 30, 2023

bdpedigo May 30, 2023

bdpedigo May 30, 2023

bdpedigo May 30, 2023

bdpedigo May 30, 2023

bdpedigo May 30, 2023

bdpedigo May 30, 2023

github-actions bot commented Jun 21, 2023 •

edited

[FAKE] GMM IC PR for comment #43

Are you sure you want to change the base?

[FAKE] GMM IC PR for comment #43

Conversation

bdpedigo commented May 30, 2023

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jun 21, 2023 • edited

✔️ Linting Passed

github-actions bot commented Jun 21, 2023 •

edited