Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

fit_on_batch for SklearnModels where BaseEstimator supports partial_fit #3973

Open
Jack-42 opened this issue May 13, 2024 · 3 comments 路 May be fixed by #3978
Open

fit_on_batch for SklearnModels where BaseEstimator supports partial_fit #3973

Jack-42 opened this issue May 13, 2024 · 3 comments 路 May be fixed by #3978

Comments

@Jack-42
Copy link

Jack-42 commented May 13, 2024

馃殌 Feature

Some scikit-learn models (BaseEstimators) support the method partial_fit (see here). For cases where these models are being wrapped by SklearnModel, it may make sense to allow the user to call fit_on_batch for the SklearnModel.

Motivation

Assuming the given BaseEstimator supports partial_fit, allowing calls to fit_on_batch would allow for SklearnModel to train on data that does not fit completely into memory (e.g., data from a DiskDataset).

Additional context

If the BaseEstimator does not support partial_fit, we could raise an AttributeError or other appropriate error if fit_on_batch is called.

Another option is to inform users in the docs that they can subclass SklearnModel to implement fit_on_batch, e.g. :

# extension of SklearnModel to implement fit_on_batch
class SklearnModelFoB(SklearnModel):
    def __init__(self, model: BaseEstimator, model_dir: Optional[str] = None, **kwargs):
        super().__init__(model, model_dir, **kwargs)
        # check that the model implements partial_fit
        partial_fit_op = getattr(model, "partial_fit", None)
        if callable(partial_fit_op):
            self.implements_partial_fit = True
        else:
            self.implements_partial_fit = False

    def fit_on_batch(
        self,
        X: np.ndarray,
        y: np.ndarray,
        w: np.ndarray,
        classes: Optional[np.ndarray] = None,
    ):
        if not self.implements_partial_fit:
            raise NotImplementedError(
                "Given sklearn model does not implement partial_fit!"
            )
        if self.use_weights:
            self.model.partial_fit(X, y, classes=classes, sample_weight=w)
        else:
            self.model.partial_fit(X, y, classes=classes)
@Jack-42 Jack-42 changed the title fit_on_batch for SklearnModels where BaseEstimator supports partial_fit fit_on_batch for SklearnModels where BaseEstimator supports partial_fit May 13, 2024
@rbharath
Copy link
Member

I'd be open to this as a new feature. Sounds potentially useful for the community. Would you be able to come by OH (9am PST, MWF ) some day to discuss with us?

@Jack-42
Copy link
Author

Jack-42 commented May 13, 2024

I'd be open to this as a new feature. Sounds potentially useful for the community. Would you be able to come by OH (9am PST, MWF ) some day to discuss with us?

Sure! I can drop by the next OH (May 15th).

@rbharath
Copy link
Member

Great! Please join the discord (https://discord.gg/ArRuv9Eu) if you haven't already. We announce timing adjustments for OH there.

@Jack-42 Jack-42 linked a pull request May 21, 2024 that will close this issue
15 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants