[ENH] make fit_predict_default configurable #1503

TonyBagnall · 2024-05-08T13:35:30Z

Describe the feature or idea you want to propose

currently fit_predict makes estimates on train data by default through cross validation. It hard codes the number of folds to 10 or the minimum number of cases in one class . I would like be able to set this to something other than 10, not immediately sure the best way of configuring this.

It also always fits the whole model. I'd like to be able to turn that off.

The context is using fit_predict to score channels for channel selection. Would like it to be fast, so want 3x CV and not to build the whole model

Describe your proposed solution

mocked up fit.

        n_channels = X.shape[1]
        scores=np.zeros(n_channels)
        # Evaluate each channel with the classifier
        for i in range(n_channels):
            preds=self.classifier.fit_predict(X[:,i,:],y)
            scores[i]=accuracy_score(y,preds)
        # Select the top n_keep channels
        sorted_indices = np.argsort(-scores)
        n_keep = math.ceil(n_channels * self.proportion)
        self.channels_selected_=sorted_indices[:n_keep]

Currently this builds 11 models per channel, assuming each class has at least 10 cases

    def _fit_predict_default(self, X, y, method):
        # fit the classifier
        self._fit(X, y)

        # predict using cross-validation
        cv_size = 10
        _, counts = np.unique(y, return_counts=True)
        min_class = np.min(counts)
        if min_class < cv_size:
            cv_size = min_class
            if cv_size < 2:
                raise ValueError(
                    f"All classes must have at least 2 values to run the "
                    f"_fit_{method} cross-validation."
                )

        random_state = getattr(self, "random_state", None)
        estimator = _clone_estimator(self, random_state)

        return cross_val_predict(
            estimator,
            X=X,
            y=y,
            cv=cv_size,
            method=method,
            n_jobs=self._n_jobs,
        )

could do it with kwargs for fit_predict maybe?

        for i in range(n_channels):
            preds=self.classifier.fit_predict(X[:,i,:],y, **{"cv_size":3,"full_model":False})

Describe alternatives you've considered, if relevant

could I set it in the constructor or pass as an explicit parameter with default 10

The text was updated successfully, but these errors were encountered:

TonyBagnall added enhancement New feature, improvement request or other non-bug code enhancement classification Classification package transformations Transformations package labels May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] make fit_predict_default configurable #1503

[ENH] make fit_predict_default configurable #1503

TonyBagnall commented May 8, 2024 •

edited

[ENH] make fit_predict_default configurable #1503

[ENH] make fit_predict_default configurable #1503

Comments

TonyBagnall commented May 8, 2024 • edited

Describe the feature or idea you want to propose

Describe your proposed solution

Describe alternatives you've considered, if relevant

TonyBagnall commented May 8, 2024 •

edited