Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Callbacks API #16925

Closed
wants to merge 14 commits into from
Closed

Callbacks API #16925

wants to merge 14 commits into from

Conversation

rth
Copy link
Member

@rth rth commented Apr 14, 2020

This is a first iteration of an API for callbacks which could be used e.g. for monitoring progress of calculations, and convergence as well as, potentially, early stopping. The goal of this PR is to experiment with callbacks, which would likely serve as a basis for a SLEP.

As proposed in the PR documentation, to implement a callback API, in their current iteration, an estimator should,

  • at the beginning of fit either explicitly call self._fit_callbacks(X, y) or use self._validate_data(X, y) which
    makes a self._fit_callbacks call internally.
  • For iterative solvers call self._eval_callbacks(n_iter=.., **kwargs) at
    each iteration, where kwargs keys must be part of supported callback
    arguments (cf. list below). The questions is whether we can meaningfully standardize parameters passed as kwargs, just passing locals() won't do.

User defined callbacks must extend the sklearn._callbacks.BaseCallback
abstract base class.

For instance some callbacks based on this PR are implemented in the
sklearn-callbacks package (see readme for detailed examples),

Progress bars #7574 #78 #10973

from sklearn_callbacks import ProgressBar

X, y = make_classification(n_samples=500000, n_features=200, random_state=0)

pipe = make_pipeline(
    SimpleImputer(),
    make_column_transformer(
        (StandardScaler(), slice(0, 80)),
        (MinMaxScaler(), slice(80, 120)),
        (StandardScaler(with_mean=False), slice(120, 180)),
    ),
    LogisticRegression(),
)

pipe._set_callbacks(ProgressBar())
pipe.fit(X, y)

SGD progress bar

Determining which callback originates from which estimator is actually non trivial (and I haven't even started dealing with parallel computing). Currently I'm re-building a separate approximate computational graph for pipelines etc. Anyway once it's done (in a separate package), this could be used to animate model training on a graph (similar to what dask.diagnostics does) or say an HTML repr of pipelines by @thomasjpfan via some jupyter widget.

Monitoring convergence #14338 #8994 (comment)

Having callbacks is also quite useful to monitor model convergence,

e.g.

from sklearn_callbacks import ConvergenceMonitor

X, y = make_regression(random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

conv_mon = ConvergenceMonitor("mean_absolute_error", X_test, y_test)

pipe = make_pipeline(StandardScaler(), Ridge(solver="sag", alpha=1))
pipe._set_callbacks(conv_mon)
_ = pipe.fit(X_train, y_train)
conv_mon.plot()

convergence-monitor

for now this only works for a small subset of linear models. One reason is that in the iteration loop, the solver must provide enough information to be able to reconstruct the model, which is not always the case. For instance for linear models it would be params + coef + intercept, but even then the definition of coef varies significantly across linear model (and whether we fit_intercept or not etc).

Early stopping #10973

The idea is that callbacks should be able to interrupt training: e.g. because some evaluation metric does no longer decrease on a validation set, cf figure above or due to other user defined reasons. This part is not yet included in this PR.

TODO

  • get some prelimiary feedback
  • study libraries with callbacks (e.g. dask, keras, pytorch etc)
  • add support for early stopping
  • add callbacks to more estimators and experiment with practical use cases in sklearn-callbacks
  • write SLEP

@rth rth marked this pull request as draft April 23, 2020 16:09
@amueller
Copy link
Member

amueller commented Jun 2, 2020

this is super awesome. Are you thinking about working further on this? Working on logging and callbacks could be a cool topic for the MLH fellowship thing. As a first pass we could do 'only' logging though?

@rth rth changed the title WIP Callbacks API Callbacks API Jun 2, 2020
@rth
Copy link
Member Author

rth commented Jun 2, 2020

Yes, I would still like to work on this but I don't have much availability at the moment.

I think working on callbacks as part of MLH would be nice, and in particular logging would indeed be a good start. Using callbacks with logging handler would IMO be better than having conditional print or logging.info everywhere. The question is what would be the next steps on this.

I marked this as WIP, but basically it's a minimal working implementation, where the API should be sufficient for logging. Then it would need to be applied to all estimators and replace our current logging approach.

If I had to change anything here it might be make callback method names a bit closer to keras callbacks. In the end I'm not sure SLEP is ideal for this, maybe introducing it as private feature, and incrementally improving it, as we did for estimator tags, might be better as it's hard to plan all of it ahead. None of the additions should have an impact on users at present.

Even a superficial review would be much appreciated, cc @NicolasHug @thomasjpfan @glemaitre

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice stuff!

Made a few comments but I mostly have questions at this point

Comment on lines +17 to +18
[tool.black]
line-length = 79
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sneaky :p

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't see the point in manually formatting code anymore for new files. It shouldn't hurt even if we are not using everywhere..

}


def _check_callback_params(**kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we let each callback independently validate its data?

My question might not make sense but I don't see this being used anywhere except in the tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we let each callback independently validate its data?
My question might not make sense but I don't see this being used anywhere except in the tests

Yes, absolutely each callback validates its data. But we also need to enforce that callbacks do follow the documented API in tests. For instance, that no undocumented parameters are passed etc which requires this function.

Third party callbacks could also use this validations function, similarly to how we expose check_array.

sklearn/base.py Outdated
Comment on lines 445 to 446
In the case of meta-estmators, callbacks are also set recursively
for all child estimators.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on doing this vs letting users set callbacks on sub-estimator instances?

what about e.g. early stopping when we ultimately support this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a deep=True option to allow disabling recursion for meta-estimators, which can certainly be useful in some cases.

In most cases though, I don't see users manually setting callbacks for each individual estimator in a complex pipeline..

Comment on lines 539 to 542
if callbacks is not None:
with gil:
_eval_callbacks(callbacks, n_iter=n_iter, coef=weights_array,
intercept=intercept_array)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember a strange behavior during the paris sprint last year (I think it was with @pierreglaser and @tomMoral ?) where the GIL was acquired in a condition like this, and even when the condition was always False the code was significantly slower.

Might be something to keep in mind

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's clearly an issues with parallel code cython/cython#3554 (and I'm not sure how to handle parallel code with callbacks so far).

However acquiring GIL in long running loops occasionally is beneficial as users can't interrupt calculation with Ctrl+C otherwise. So acquiring GIL at the end of each epoch would actually solve a bug here #9136 (comment)

Will switch to acquire the GIL at the end of each epoch even if callback is None.

@@ -103,6 +104,8 @@ def _mv(x):
# old scipy
coefs[i], info = sp_linalg.cg(C, y_column, maxiter=max_iter,
tol=tol)
if callbacks is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you can remove this since the is not None check is also done in _eval_callbacks


def _eval_callbacks(self, **kwargs):
"""Call callbacks, e.g. in each iteration of an iterative solver"""
from ._callbacks import _eval_callbacks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why lazy import?

Comment on lines +44 to +45
if callbacks is None:
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we rely on callbacks being an empty list instead of subcasing with None?

Or maybe you are anticipating a future where callbacks=None would be a default argument to estimators?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we switched to the case when callbacks=None when missing.

assert callback.n_calls == 0
estimator.fit(X, y)
if callback.n_fit_calls == 0:
pytest.skip("callbacks not implemented")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise assert it's equal to 1?



def check_has_callback(est, callback):
assert getattr(est, "_callbacks", None) is not None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this different from hasattr? Or can the attribute exist and be None for some reason?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, reworded more clearly as,

assert hasattr(est, "_callbacks") and est._callbacks is not None

since None could be equivalent to [], just to be sure that this is not happening.

@@ -1335,7 +1345,7 @@ def transform(self, X):
beta_loss=self.beta_loss, tol=self.tol, max_iter=self.max_iter,
alpha=self.alpha, l1_ratio=self.l1_ratio, regularization='both',
random_state=self.random_state, verbose=self.verbose,
shuffle=self.shuffle)
shuffle=self.shuffle, callbacks=getattr(self, '_callbacks', []))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should either decide that

  • no callbacks means an empty list
  • no callabacks means None and having callbacks means a non-empty list

but it seems that the code is mixing both right now?

If we ultimately plan on having callbacks=None as a default param then the latter would be more appropriate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right let's go with callbacks=None everywhere, and just let the eval function handle it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, here we would still need to do provide the default option to getattrs: getattr(self, '_callbacks', None) and that's not really much more readable than getattr(self, '_callbacks', []) so both would work..

@rth rth mentioned this pull request Jun 4, 2020
@rth rth marked this pull request as ready for review June 4, 2020 14:33
@rth
Copy link
Member Author

rth commented Jun 4, 2020

Thanks @NicolasHug ! I think I have addressed your comments, please let me know if you have other questions.

I have updated the required callbacks method names, inspired by Keras,

      class MyCallback(BaseCallback):

        def on_fit_begin(self, estimator, X, y):
            ...
        def on_iter_end(self, **kwargs):
            ...

which I think is more explicit than earlier names.

what about e.g. early stopping when we ultimately support this?

The plan for early stopping so far is,

  • in on_fit_begin (which is also called inside self._validate_data) return X, y, sample_weight with the training data, excluding the validation set. Store the validation set on the callback.
  • in on_iter_end compute the validation score (this is actually not easy depending on what information we have when the callback is called), and return a not None value to interrupt the training loop.

For a complex pipeline, with callbacks set recursively, clearly we need to do this only for estimator where it makes sense, but I think it should be doable. In any case this can probably done in a follow up PR. For now I put in the documentation that the return value of callback methods is ignored.

@rth
Copy link
Member Author

rth commented Jun 4, 2020

BTW, regarding logging and #17439 (cc @thomasjpfan, @adrinjalali ), there is an example of logging callback in sklearn-callbacks repo. For instance, this example

from sklearn.compose import make_column_transformer
from sklearn.datasets import make_classification
from sklearn.impute import SimpleImputer
from sklearn.linear_model import SGDClassifier
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn_callbacks import DebugCallback

X, y = make_classification(n_samples=10000, n_features=100, random_state=0)

pipe = make_pipeline(
    SimpleImputer(),
    make_column_transformer(
        (StandardScaler(), slice(0, 80)), (MinMaxScaler(), slice(80, 90)),
    ),
    SGDClassifier(max_iter=20),
)


pbar = DebugCallback()
pipe._set_callbacks(pbar)

pipe.fit(X, y)

would produce,

python examples/logging-pipeline.py 
2020-06-04 18:10:15,128 INFO     fit_begin Pipeline [...]
2020-06-04 18:10:15,129 INFO     fit_begin SimpleImputer()
2020-06-04 18:10:15,132 INFO     fit_begin SimpleImputer()
2020-06-04 18:10:15,137 INFO     fit_begin ColumnTransformer [...]
2020-06-04 18:10:15,137 INFO     fit_begin StandardScaler()
2020-06-04 18:10:15,143 INFO     fit_begin StandardScaler()
2020-06-04 18:10:15,145 INFO     fit_begin MinMaxScaler()
2020-06-04 18:10:15,148 INFO     fit_begin SGDClassifier(max_iter=20)
2020-06-04 18:10:15,151 INFO     iter_end n_iter=0, loss=246525.58905033383
2020-06-04 18:10:15,152 INFO     iter_end n_iter=1, loss=74070.67406258777
2020-06-04 18:10:15,154 INFO     iter_end n_iter=2, loss=45127.853390181925
2020-06-04 18:10:15,155 INFO     iter_end n_iter=3, loss=33741.79716248915
2020-06-04 18:10:15,156 INFO     iter_end n_iter=4, loss=26982.451195702135
2020-06-04 18:10:15,157 INFO     iter_end n_iter=5, loss=22581.941031038896
2020-06-04 18:10:15,158 INFO     iter_end n_iter=6, loss=20275.28466109527
2020-06-04 18:10:15,160 INFO     iter_end n_iter=7, loss=17646.77498466535
2020-06-04 18:10:15,161 INFO     iter_end n_iter=8, loss=16516.305096143267
2020-06-04 18:10:15,162 INFO     iter_end n_iter=9, loss=15059.036141226175
2020-06-04 18:10:15,163 INFO     iter_end n_iter=10, loss=13906.533547375711
2020-06-04 18:10:15,165 INFO     iter_end n_iter=11, loss=12996.828040061095
2020-06-04 18:10:15,166 INFO     iter_end n_iter=12, loss=12410.182898002984
2020-06-04 18:10:15,167 INFO     iter_end n_iter=13, loss=11715.624170487545
2020-06-04 18:10:15,168 INFO     iter_end n_iter=14, loss=11283.087560475777
2020-06-04 18:10:15,170 INFO     iter_end n_iter=15, loss=10744.568097461684
2020-06-04 18:10:15,171 INFO     iter_end n_iter=16, loss=10361.116776342227
2020-06-04 18:10:15,172 INFO     iter_end n_iter=17, loss=9946.303790908443
2020-06-04 18:10:15,173 INFO     iter_end n_iter=18, loss=9707.741506199667
2020-06-04 18:10:15,175 INFO     iter_end n_iter=19, loss=9390.37295290932

This can clearly be improved, e.g. linked with our verbose flag, and the discussion about logging API is still very relevant but I think building a logging solution internally on top of callbacks would help getting a consistent experience across the library which is not case currently with our verbose=1 + print approach.

@thomasjpfan
Copy link
Member

This can get a tricky when we have logging in our cython or c code.

Having designed callbacks in skorch and reviewed callbacks in fastai, I think it is hard to come up with a complete list of items to pass to the callback. Fastai can pass the caller object directly to the callback and the callback can directly change the state of the model.

For our case, passing metrics should be good enough. For logging, we also commonly log the elapsed time and may have interesting formatting (Pipeline). For HistGradientBoosting, it has some custom metrics that is only related to itself:

Time spent computing histograms: 0.041s
Time spent finding best splits:  0.012s
Time spent applying splits:      0.024s
Time spent predicting:           0.002s

It also logs the number of leaves etc:

[1/10] 1 tree, 31 leaves, max depth = 12, in 0.011s
[2/10] 1 tree, 31 leaves, max depth = 13, in 0.010s

I would not think tree or leaves belong to as a kwarg argument in the callback.

@rth
Copy link
Member Author

rth commented Jun 4, 2020

Thanks for the feedback @thomasjpfan !

Fastai can pass the caller object directly to the callback and the callback can directly change the state of the model

In the current PR we can pass the estimator object to the callback, but we can't indeed do that from C or Cython code since it's not available there. We could take locals() but it's unclear what to do with them in the callback. I agree that having a complete list of things to pass to the callback is difficult, particularly given that we have much more heterogeneous models than in neural net libraries.

For logging, we also commonly log the elapsed time

For elapsed time we would also need to add the on_fit_end method. I initially didn't want to do that, as that would mean much more changes all over the code base. We can determine the elapsed time by taking the difference between two on_fit_end in a pipeline but it's clearly not fail-proof (particularly as soon as we start doing things in parallel). I imagine we could also only add it to a few models where it's relevant (Pipeline, ColumnTransformer etc).

For HistGradientBoosting, it has some custom metrics that is only related to itself:
I would not think tree or leaves belong to as a kwarg argument in the callback.

yes, it's one of the challenge that models are really heterogeneous. I guess we would also add a on_log method and just take a string input as a workaround for those cases.

The thing is that if we don't use callbacks for logging, it means that we would have to do very similar things twice, once for logging and once for callbacks.

@jnothman
Copy link
Member

We could pass a specified set of objects/data (targeted to logging), as well as mutable locals() for use at one's own risk...?

@amueller
Copy link
Member

amueller commented Jul 2, 2020

How would this interact with multiprocessing? I'm not sure I understand how you would implement grid-search logging with multiprocessing as callbacks.

@jnothman
Copy link
Member

jnothman commented Jul 2, 2020 via email

@amueller
Copy link
Member

amueller commented Jul 8, 2020

@jnothman I guess it isn't clear to me how you would write to a shared resource with joblib.

@rth
Copy link
Member Author

rth commented Jul 9, 2020

@amueller Yes, it's not that straightforward. One could send callback parameters to a shared queue from different processes. But then one needs an additional thread to process these callback events, and start/stop this thread at some point. For instance,

See example below,

from multiprocessing import Manager
from threading import Thread, Lock
import queue

from joblib import delayed, Parallel
from time import sleep


class Callback():
    def __init__(self, m):
        self.q = m.Queue()
        
    def on_iter_end(self, x):
        res = x**2
        self.q.put(f'callback value {x}')
        return res
    
    @staticmethod
    def process_callback_events(q, should_exit):
        """Process all events in the queue"""
        while True:
            try:
                print(q.get_nowait())
            except queue.Empty:
                print('queue empty')
                if should_exit.locked():
                    return
                else:
                    sleep(1)
    
    def start_processing_callback(self):
        thread_should_exit = Lock()
        thread = Thread(target=self.process_callback_events,
                             args=(self.q, thread_should_exit))
        thread.start()
        return thread_should_exit
    

m = Manager()
        
cbk = Callback(m)

callback_processing_stop = cbk.start_processing_callback()
res = Parallel(n_jobs=2)(
    delayed(cbk.on_iter_end)(x) for x in range(10)
)
callback_processing_stop.acquire(blocking=False)

print(res)

which would produce,

queue empty
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
callback value 0
callback value 1
callback value 2
callback value 3
callback value 4
callback value 5
callback value 6
callback value 7
callback value 8
callback value 9
queue empty

But then I'm sure there are plenty of edge cases with different joblib backends that would need handling (e.g. nested parallel blocks). And this would require re-designing the callbacks API somewhat.
In any case for progress bars, I'm not sure there is a way around starting an extra monitoring thread in the case when joblib is used. Though maybe it's not a big issue, for instance tqdm does that as far as I understand, and this will only run if the user explicitly activates the callback.

@amueller
Copy link
Member

amueller commented Jul 15, 2020

Do you think this is worth the complexity? It seems like a can of worms, but it also would open a lot of doors (does that qualify as mixed metaphors?).
I haven't looked into the logging alternative, which I assume would be somewhat simpler but much less powerful.

Is it plausible / feasible to make the actual usage mostly hidden from the user? I assume that could work if joblib maybe get some special hook? I'd rather not litter the sklearn code with callback locking code.

And is there an easy way for the callback to determine the backend?

We could also go to logging first if it's "easy enough" and then try to go to callbacks later once we figure out how to do it and whether it's worth it?

@rth
Copy link
Member Author

rth commented Jul 15, 2020

I haven't looked into the logging alternative, which I assume would be somewhat simpler but much less powerful.

As far as I can tell the approach of logging in the case of multiprocessing would work very similarly with a queue (logging.QueueHandler) and monitoring thread (logging.QueueListener) #78 (comment) It's a bit more abstracted from the user, but with the same issue of needing to start/stop this monitoring thread either around each parallel section or somehow globally. And needing to account for all the joblib special cases. The implementation of logging.QueueListener is not very far from the above prototype.

Do you think this is worth the complexity? [...]
Is it plausible / feasible to make the actual usage mostly hidden from the user? I assume that could work if joblib maybe get some special hook? I'd rather not litter the sklearn code with callback locking code.

In terms of complexity, as long as it's isolated in the optional Callback object it might be not so bad. The above code can certainly be improved as well. The issue is more indeed the need to start/stop the callback/log monitoring thread around each parallel section which is not ideal. We should talk with joblib devs to see if this could be made more user friendly (both here and for logging).

@amueller
Copy link
Member

You are in closer physical proximity to the joblib people ;)
I have no idea about joblib tbh but I figure delayed could accept a more complex object that has __setup__ and __teardown__ in addition to __call__?

It looks like there's no way around modifying each call to Parallel that contains logging, but I guess that would have been the case anyway.

Thinking about it a bit more, does that happen very often? You could do a bunch in this PR without it. It's not only GridSearchCV, right? I guess OVO and OVR and VotingClassifier are candidates?
How would we log in RandomForestClassifier? I guess we'd need the queue, right?

In other words, does it make sense to do a first solution where we don't handle the parallel case? I guess users will probably be miffed if we introduce and interface but it's not supported in GridSearchCV.

@rth
Copy link
Member Author

rth commented Jul 15, 2020

With @thomasjpfan we discussed that maybe it could be simpler to start with implementing just ConvergenceMonitor (to monitor convergence on the train and validation sets) as the first simple callback in scikit-learn. That would only apply to the final classifier/regression, and might not need to deal with all the parallel stuff at first (though RandomForestClassifier or even linear models with OVR hmm). There the additional issue is more how does one reconstruct the full model state from the information available in the callback to evaluate it on the validation set. That is kind of related to the question of how to implement early stopping, which would also be not easy to support in all estimators in one go.

In general I'm all for small incremental PRs. Having this in the private API even without parallel support could already be quite useful. It's more of a question of how confident are we that we will be able to add support for joblib later if needed without massively changing the API, and that the chosen API is generally reasonable.

@dbczumar
Copy link

dbczumar commented Jul 28, 2020

Hi folks, just chiming in here to note that a callbacks API would be very useful for projects that collect / aggregate metadata for model training routines. In particular, the MLflow project would benefit greatly from a callbacks API in the context of its upcoming scikit-learn "autologging feature" (discussed here: mlflow/mlflow#2050): callbacks would enable MLflow to patch in custom hooks that store per-epoch metrics for a given scikit-learn training session in a centralized tracking service.

We would also be very excited to see this capability introduced as a private API in an upcoming scikit-learn release, even if it does not apply to all classifiers or to parallel execution environments (joblib, Dask, etc), as alluded to in #16925 (comment).

@mfeurer
Copy link
Contributor

mfeurer commented Oct 6, 2020

Hi folks, I also briefly wanted to propose another use of a future callbacks API that would be very useful for projects that train ML models under a time budget, which is stopping the training if the time is up. This would be very handy in the Auto-sklearn project. For each evaluation of the ML model we give a time limit and end the process if the time limit is hit. With such a callback we could safely shut down the process ourselves and still use the partially trained model.

I'm not sure if this is use case will be considered (it's some form of early stopping I guess), but I just wanted to bring up another potential use case that wasn't discussed yet.

@MJimitater
Copy link

Hello folks, I love and strongly support the idea of callbacks in sklearn, thanks for the progress so far, is this still a WIP for other algorithms, such as GaussianProcessClassifier?

@ogrisel
Copy link
Member

ogrisel commented Mar 16, 2021

+1 for moving forward with this PR with a private-only callback registration API for now, possibly without specific support for the parallel case (as long as the callbacks themselves are picklable).

We already override the delayed function in scikit-learn, so @rth feel free to prototype something with it to make it possible to implement additional logic on the loky workers for callbacks that need access to a shared resources. I think we would need to prototype a specific use case to see if we really need to change things in joblib or not.

@rth
Copy link
Member Author

rth commented Dec 17, 2021

Superseded by #22000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Andy's pets
Awaiting triage
Development

Successfully merging this pull request may close these issues.

None yet

9 participants