Feature discussion: callbacks for long operations #10973

markotoplak · 2018-04-13T10:44:29Z

Ideally, I think, operations that take a lot of time should both:

have a way of showing progress, and
be interruptable.

It was previously discussed at least in #78, #7574, #7596.

Regarding progress, in #7574 @amueller proposed not bake progress bar in, but rather add callbacks. In #7596 @denis-bz suggested to have callbacks that are passed locals(), which is an interesting idea. I also saw that fit() in GradientBoostingClassifier has the monitor parameter, which makes showing progress bars easy.

In #7596, @denis-bz suggested callbacks can also be used to interrupt computation. In our project, Orange (https://github.com/biolab/orange3), we do something similar: sometimes, where there is no other available mechanism, we raise a BaseException inside a callback to interrupt running threads.

Orange uses scikit-learn a lot and lack of callbacks in scikit-learn makes showing progress or interrupting hard (we'd like to allow stopping of running computations). For now, we have to resort to hacks. For example, inour Neural Network widget, we subclassed scikit-learn NNs and added a callback on n_iter_ change (biolab/orange3#2958)

We would like to help in implementing callbacks, but first, we are asking if you would even consider having something similar to GradientBoostingClassifier monitor in the other classes. What do you think?

Then, we could try thinking of an interface together and slowly start adding it to certain classed.

The text was updated successfully, but these errors were encountered:

jnothman · 2018-04-14T11:26:03Z

Hello Marko! Without reviewing the past discussion of these topics (including perhaps my past opinions which may not agree with the present), I think callbacks are a Useful Thing, and have become expected functionality in machine learning libraries.

I don't think a parameter to fit is necessarily in accordance with how we like to design things these days, and we'd probably choose to make it a class parameter, especially if it might be used for early stopping and hence be deemed a hyperparameter.

I also think callbacks are a good alternatives to us receiving ad-hoc contributions of logging / progress meters.

Some risks:

it will introduce an additional parameter, but perhaps this is a small cost. will it potentially introduce more than one parameter?
it may reduce performance, but I assume this will be negligible
it may make introducing parallelism to some implementations harder
it's hard to know what to pass to a callback. Ensuring backwards compatibility of the callback's args, consistency across estimators, and up-to-date documentation, presents as potentially a big maintenance cost. We may want to think about how to make this usable but maintainable. This will take some prototyping and case studies.
it may take a long time to review and merge many such implementations

domoritz mentioned this issue May 30, 2018

Use python logging to report on convergence progress it level info for long running tasks #78

Open

glemaitre mentioned this issue Jul 31, 2019

get model params while training #14531

Closed

rth mentioned this issue Apr 14, 2020

Callbacks API #16925

Closed

5 tasks

glemaitre mentioned this issue Aug 3, 2021

Add custom callbacks after fitting at each step in Pipeline #20668

Closed

jeremiedbb mentioned this issue Dec 16, 2021

[WIP] Callback API continued #22000

Draft

4 tasks

cmarmo added the New Feature label Jan 15, 2022

jeremiedbb mentioned this issue Apr 19, 2022

Getting ETA for model training #23156

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature discussion: callbacks for long operations #10973

Feature discussion: callbacks for long operations #10973

markotoplak commented Apr 13, 2018

jnothman commented Apr 14, 2018

Feature discussion: callbacks for long operations #10973

Feature discussion: callbacks for long operations #10973

Comments

markotoplak commented Apr 13, 2018

jnothman commented Apr 14, 2018