You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some scikit-learn models (BaseEstimators) support the method partial_fit (see here). For cases where these models are being wrapped by SklearnModel, it may make sense to allow the user to call fit_on_batch for the SklearnModel.
Motivation
Assuming the given BaseEstimator supports partial_fit, allowing calls to fit_on_batch would allow for SklearnModel to train on data that does not fit completely into memory (e.g., data from a DiskDataset).
Additional context
If the BaseEstimator does not support partial_fit, we could raise an AttributeError or other appropriate error if fit_on_batch is called.
Another option is to inform users in the docs that they can subclass SklearnModel to implement fit_on_batch, e.g. :
# extension of SklearnModel to implement fit_on_batchclassSklearnModelFoB(SklearnModel):
def__init__(self, model: BaseEstimator, model_dir: Optional[str] =None, **kwargs):
super().__init__(model, model_dir, **kwargs)
# check that the model implements partial_fitpartial_fit_op=getattr(model, "partial_fit", None)
ifcallable(partial_fit_op):
self.implements_partial_fit=Trueelse:
self.implements_partial_fit=Falsedeffit_on_batch(
self,
X: np.ndarray,
y: np.ndarray,
w: np.ndarray,
classes: Optional[np.ndarray] =None,
):
ifnotself.implements_partial_fit:
raiseNotImplementedError(
"Given sklearn model does not implement partial_fit!"
)
ifself.use_weights:
self.model.partial_fit(X, y, classes=classes, sample_weight=w)
else:
self.model.partial_fit(X, y, classes=classes)
The text was updated successfully, but these errors were encountered:
Jack-42
changed the title
fit_on_batch for SklearnModels where BaseEstimator supports partial_fitfit_on_batch for SklearnModels where BaseEstimator supports partial_fitMay 13, 2024
I'd be open to this as a new feature. Sounds potentially useful for the community. Would you be able to come by OH (9am PST, MWF ) some day to discuss with us?
I'd be open to this as a new feature. Sounds potentially useful for the community. Would you be able to come by OH (9am PST, MWF ) some day to discuss with us?
馃殌 Feature
Some scikit-learn models (
BaseEstimator
s) support the methodpartial_fit
(see here). For cases where these models are being wrapped bySklearnModel
, it may make sense to allow the user to callfit_on_batch
for theSklearnModel
.Motivation
Assuming the given
BaseEstimator
supportspartial_fit
, allowing calls tofit_on_batch
would allow forSklearnModel
to train on data that does not fit completely into memory (e.g., data from aDiskDataset
).Additional context
If the
BaseEstimator
does not supportpartial_fit
, we could raise anAttributeError
or other appropriate error iffit_on_batch
is called.Another option is to inform users in the docs that they can subclass
SklearnModel
to implementfit_on_batch
, e.g. :The text was updated successfully, but these errors were encountered: