Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature discussion: callbacks for long operations #10973

Open
markotoplak opened this issue Apr 13, 2018 · 1 comment
Open

Feature discussion: callbacks for long operations #10973

markotoplak opened this issue Apr 13, 2018 · 1 comment

Comments

@markotoplak
Copy link
Contributor

Ideally, I think, operations that take a lot of time should both:

  • have a way of showing progress, and
  • be interruptable.

It was previously discussed at least in #78, #7574, #7596.

Regarding progress, in #7574 @amueller proposed not bake progress bar in, but rather add callbacks. In #7596 @denis-bz suggested to have callbacks that are passed locals(), which is an interesting idea. I also saw that fit() in GradientBoostingClassifier has the monitor parameter, which makes showing progress bars easy.

In #7596, @denis-bz suggested callbacks can also be used to interrupt computation. In our project, Orange (https://github.com/biolab/orange3), we do something similar: sometimes, where there is no other available mechanism, we raise a BaseException inside a callback to interrupt running threads.

Orange uses scikit-learn a lot and lack of callbacks in scikit-learn makes showing progress or interrupting hard (we'd like to allow stopping of running computations). For now, we have to resort to hacks. For example, inour Neural Network widget, we subclassed scikit-learn NNs and added a callback on n_iter_ change (biolab/orange3#2958)

We would like to help in implementing callbacks, but first, we are asking if you would even consider having something similar to GradientBoostingClassifier monitor in the other classes. What do you think?

Then, we could try thinking of an interface together and slowly start adding it to certain classed.

@jnothman
Copy link
Member

Hello Marko! Without reviewing the past discussion of these topics (including perhaps my past opinions which may not agree with the present), I think callbacks are a Useful Thing, and have become expected functionality in machine learning libraries.

I don't think a parameter to fit is necessarily in accordance with how we like to design things these days, and we'd probably choose to make it a class parameter, especially if it might be used for early stopping and hence be deemed a hyperparameter.

I also think callbacks are a good alternatives to us receiving ad-hoc contributions of logging / progress meters.

Some risks:

  • it will introduce an additional parameter, but perhaps this is a small cost. will it potentially introduce more than one parameter?
  • it may reduce performance, but I assume this will be negligible
  • it may make introducing parallelism to some implementations harder
  • it's hard to know what to pass to a callback. Ensuring backwards compatibility of the callback's args, consistency across estimators, and up-to-date documentation, presents as potentially a big maintenance cost. We may want to think about how to make this usable but maintainable. This will take some prototyping and case studies.
  • it may take a long time to review and merge many such implementations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants