Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outside functionality: add a "fitter" generator as an alternative to "fit" method #19076

Open
jamartinh opened this issue Dec 28, 2020 · 3 comments

Comments

@jamartinh
Copy link

Describe the workflow you want to enable

At the current time, Estimators has a "fit" method.

The fact that this is a method makes hard to "publish" or "monitor" the progress and evoluton of the internal state of the Estimator.

Now, having a "fitter" method as a generator (with yield statement) will leverage huge advantages over having to use specialized and standardized callbacks having to infer the effect of this callback in the internal loops and with little control on the access of the Estimator internal state.

Describe your proposed solution

Define a protocol to have the option to include "fitter()" generator as well, so people can perform checks every iteration (or every n%x==0) iterations and then have the option to plot, monitor, and use the internal state and params of the Estimator.

for estimator_output in estimator.fitter(data):
    print(estimator_output)
    pyplot.plot(estimator_output.parameters[0])
    estimator.learning_rate/=0.9
    if estimator_output.error>12:
        break
  • This will provide a rich way of doing experiments and don't spend much time in creating sophisticated callback.
  • This will provide the possibility to stop iterating when user wants to without "ctrl+c"
  • This will help alleviate some black box feeling about "fit" methods

Describe alternatives you've considered, if relevant

I don't have come with a way of doing this without the "fitter" generator

Additional context

  • This will help even more debugging of current algorithm implementations
  • This can even be used to allow the interaction of two fitters comparing its behavior

Finally

In resume I consider it will change the way people see, understand and interact with machine learning algorithms implemented in the sklearn interface.

@glemaitre
Copy link
Member

I think that we are leaning towards callbacks to support these usecases: #16925

@jamartinh
Copy link
Author

Hi @glemaitre thanks for pointing out the Callbacks initiative.
What I see is that the callbacks pattern born when programing languages and tools were not enough powerfull to allow for more interactive control over iterative algorithms and in that times the envisioned solution to allow a little interaction with the iterations or events were the use of callbacks as a remedy.

It is good to have a standard API on Callbacks, however with the new programing languages and tools we have these days, I think callbacks paradigm as a solution can be overcomed with direct interactivity.

It is not only that callbacks impose a way to interact or monitirize programmatically, that is, not truly interactively, but it is that of interactivity itself (e.g. REPL like debugging, tuning, monitoring).

I am afraid that generator like iterative algorithms provide by far a more powerful way to utilize, tune, optimize and debug than the use of programmatic callbacks.

I know of course many ML libraries are using this callbacks pattern and it has become a de facto standard, this is because people want to interact with the algorithms and the legacy of more obscure times is still the use of callbacks.

Why people switched from Tensorflow to Pytorch? because interactivity and the dynamic nature of pytorch using the dynamic nature of Python
Why Python is succeeding in research? because of simplicity and interactivity

In other words, callbacks in the sense they are being used today in many ML/DL/RL libraries are the legacy of the limitations of older tools not as dynamic and powerful as Python is today.

@jnothman
Copy link
Member

Looking again at #16925, I think generators/iterators might be even trickier than callbacks in the multiprocessing context. We would be generating a series of events, with only limited assurances of order, which is much more like callbacks, in that you can't much rely on state. The obvious benefit of callbacks is that it is a much smaller change to our API.

@cmarmo cmarmo added the API label Jan 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants