ETA On Compare Models #2428

rajatscibi · 2022-04-18T14:17:03Z

Is your feature request related to a problem? Please describe.
Often when dataset is large or it has more number of features, the compare_models() functions takes a lot of time, sometimes hours on a regular 4GB RAM, SSD storage laptop without GPU. It's often not clear how much time it will take which makes it bit difficult to plan certain things and one may feel stuck.

Describe the solution you'd like
If we can have an ETA (estimated time remaining) on compare_models(), it will be very helpful.

Describe alternatives you've considered

Additional context

ngupta23 · 2022-04-18T16:57:28Z

I think this is not possible right now since there is no way to probe the underlying models on the amount of time remaining (limitation of scikit-learn).

@Yard1 - any comments on this request?

rajatscibi · 2022-04-18T17:32:55Z

Found something here

rajatscibi · 2022-04-18T18:43:18Z

There is something called as scitime as mentioned here

Scitime is a python package requiring at least python 3.6 with pandas, scikit-learn, psutil and joblib dependencies. You will find the Scitime repo here.

The main function in this package is called “time”. Given a matrix vector X, the estimated vector Y along with the Scikit Learn model of your choice, time will output both the estimated time and its confidence interval.

ngupta23 · 2022-04-18T18:59:36Z

This seems to be quite an invasive method to compute run time. You are essentially building models around the model that you really want to build. Also, not sure this applies to all sklearn functionality such as cross_validate, GridSearchCV, etc.

Would prefer a method that is more tightly coupled with sklearn functionality. Also, would like to hear the thoughts of the other core developers as well.

rajatscibi · 2022-04-18T19:06:43Z

Understandable. Lets keep this feature request open for now for someone else to come up with some idea on its execution.

Yard1 · 2022-04-18T19:09:00Z

Yeah, none of the available solution would work here (or indeed, for all of our models).

rajatscibi · 2022-04-18T19:20:19Z

Is it possible to estimate once the training has started and some time has elapsed. For example 10% got completed in 60 seconds, so approximately ETA to reach 100% can be around 600 seconds more or less?

rajatscibi · 2022-04-18T19:22:44Z

Ill put this on sklearn repo as well and let see what they have to say about it. Sharing on stackoverflow as well for computer science community to suggest something.

ngupta23 · 2022-04-18T19:48:35Z

Is it possible to estimate once the training has started and some time has elapsed. For example 10% got completed in 60 seconds, so approximately ETA to reach 100% can be around 600 seconds more or less?

This is not true since training is not a linear process.

Moreover, what do you mean by training is 10% complete? This may work in case of Deep Neural Networks where you have epochs, but that is not the case for machine learning models (for the most part).

ngupta23 · 2022-04-18T19:50:44Z

Ill put this on sklearn repo as well and let see what they have to say about it. Sharing on stackoverflow as well for computer science community to suggest something.

This may be the best path forward. This should be handled in the sklearn repo itself rather than as a wrapper around the models outside sklearn. That would be the most sustainable way to do this (if it is even possible).

rajatscibi · 2022-04-20T05:55:30Z

Reply from scikitlearn:

This is a recurrent feature request that will be resolved by the ongoing work on a callback API (#22000)

I'm closing this issue because it is a duplicate of #10973 and #78.

Link to post

rajatscibi added the enhancement New feature or request label Apr 18, 2022

ngupta23 added the compare_models label Apr 18, 2022

rajatscibi closed this as completed Apr 20, 2022

github-actions bot locked as resolved and limited conversation to collaborators May 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ETA On Compare Models #2428

ETA On Compare Models #2428

rajatscibi commented Apr 18, 2022 •

edited

ngupta23 commented Apr 18, 2022 •

edited

rajatscibi commented Apr 18, 2022 •

edited

rajatscibi commented Apr 18, 2022

ngupta23 commented Apr 18, 2022 •

edited

rajatscibi commented Apr 18, 2022

Yard1 commented Apr 18, 2022

rajatscibi commented Apr 18, 2022

rajatscibi commented Apr 18, 2022

ngupta23 commented Apr 18, 2022 •

edited

ngupta23 commented Apr 18, 2022

rajatscibi commented Apr 20, 2022

ETA On Compare Models #2428

ETA On Compare Models #2428

Comments

rajatscibi commented Apr 18, 2022 • edited

ngupta23 commented Apr 18, 2022 • edited

rajatscibi commented Apr 18, 2022 • edited

rajatscibi commented Apr 18, 2022

ngupta23 commented Apr 18, 2022 • edited

rajatscibi commented Apr 18, 2022

Yard1 commented Apr 18, 2022

rajatscibi commented Apr 18, 2022

rajatscibi commented Apr 18, 2022

ngupta23 commented Apr 18, 2022 • edited

ngupta23 commented Apr 18, 2022

rajatscibi commented Apr 20, 2022

rajatscibi commented Apr 18, 2022 •

edited

ngupta23 commented Apr 18, 2022 •

edited

rajatscibi commented Apr 18, 2022 •

edited

ngupta23 commented Apr 18, 2022 •

edited

ngupta23 commented Apr 18, 2022 •

edited