Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] make fit_predict_default configurable #1503

Open
TonyBagnall opened this issue May 8, 2024 · 0 comments
Open

[ENH] make fit_predict_default configurable #1503

TonyBagnall opened this issue May 8, 2024 · 0 comments
Labels
classification Classification package enhancement New feature, improvement request or other non-bug code enhancement transformations Transformations package

Comments

@TonyBagnall
Copy link
Contributor

TonyBagnall commented May 8, 2024

Describe the feature or idea you want to propose

currently fit_predict makes estimates on train data by default through cross validation. It hard codes the number of folds to 10 or the minimum number of cases in one class . I would like be able to set this to something other than 10, not immediately sure the best way of configuring this.

It also always fits the whole model. I'd like to be able to turn that off.

The context is using fit_predict to score channels for channel selection. Would like it to be fast, so want 3x CV and not to build the whole model

Describe your proposed solution

mocked up fit.

        n_channels = X.shape[1]
        scores=np.zeros(n_channels)
        # Evaluate each channel with the classifier
        for i in range(n_channels):
            preds=self.classifier.fit_predict(X[:,i,:],y)
            scores[i]=accuracy_score(y,preds)
        # Select the top n_keep channels
        sorted_indices = np.argsort(-scores)
        n_keep = math.ceil(n_channels * self.proportion)
        self.channels_selected_=sorted_indices[:n_keep]

Currently this builds 11 models per channel, assuming each class has at least 10 cases

    def _fit_predict_default(self, X, y, method):
        # fit the classifier
        self._fit(X, y)

        # predict using cross-validation
        cv_size = 10
        _, counts = np.unique(y, return_counts=True)
        min_class = np.min(counts)
        if min_class < cv_size:
            cv_size = min_class
            if cv_size < 2:
                raise ValueError(
                    f"All classes must have at least 2 values to run the "
                    f"_fit_{method} cross-validation."
                )

        random_state = getattr(self, "random_state", None)
        estimator = _clone_estimator(self, random_state)

        return cross_val_predict(
            estimator,
            X=X,
            y=y,
            cv=cv_size,
            method=method,
            n_jobs=self._n_jobs,
        )

could do it with kwargs for fit_predict maybe?

        for i in range(n_channels):
            preds=self.classifier.fit_predict(X[:,i,:],y, **{"cv_size":3,"full_model":False})

Describe alternatives you've considered, if relevant

could I set it in the constructor or pass as an explicit parameter with default 10

@TonyBagnall TonyBagnall added enhancement New feature, improvement request or other non-bug code enhancement classification Classification package transformations Transformations package labels May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
classification Classification package enhancement New feature, improvement request or other non-bug code enhancement transformations Transformations package
Projects
None yet
Development

No branches or pull requests

1 participant