Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display progress bar for grid search and exponentiated gradient #517

Closed
wants to merge 1 commit into from

Conversation

arjsingh
Copy link

No description provided.

@arjsingh
Copy link
Author

The progress bar is shown below by highlighting the difference between what we have now and what we intend to have:

Current grid-search:
before_tqdm_grid

grid-search after latest changes:
after_tqdm_grid

Current exponentiated-gradient:
before_tqdm_expo

exponentiated-gradient after latest changes:
after_tqdm_expo

Copy link
Member

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I'll send you some instructions for DCO for the future.

@@ -41,6 +41,7 @@
* Add new constraints and objectives in `ThresholdOptimizer`
* Add class `InterpolatedThresholder` to represent the fitted `ThresholdOptimizer`
* Add `fairlearn.datasets` module.
* Display progress bar for grid search and exponentiated gradient
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: GridSearch and ExponentiatedGradient

Signed-off-by: Arjun Singh <arjsingh@microsoft.com>
@adrinjalali
Copy link
Member

I'd be happier if we add an option like verbose and only have the progress bar if it's set. If it were me I'd also make tqdm a soft dependency. By default there should be zero stdout/stderr output when fitting any estimator IMO unless a warning or an error/exception happens.

@romanlutz
Copy link
Member

I'd be happier if we add an option like verbose and only have the progress bar if it's set. If it were me I'd also make tqdm a soft dependency. By default there should be zero stdout/stderr output when fitting any estimator IMO unless a warning or an error/exception happens.

@adrinjalali this is interesting, thanks for sharing! We've had a number of people ask for this because the refitting over so many iterations can take time and it's not necessarily clear whether the notebook died in the meanwhile. Is there a downside to showing that output? If not, I feel tempted to say the default is to show it.

By soft dependency do you mean to use it if it's available but not otherwise (try-except), or do you mean having an extension of fairlearn, say pip install fairlearn[progress] that installs tqdm? I'm a bit averse to extensions since few people tend to use them (at least in my experience), although the try-except version is even worse that way since there's no particular way to notify users that they should install tqdm to get the progress bar.

@adrinjalali
Copy link
Member

We've had a number of people ask for this because the refitting over so many iterations can take time and it's not necessarily clear whether the notebook died in the meanwhile.

And that's why it's a good idea to have verbose as a feature :D

It's the same with sklearn's *SearchCV or any other model for that matter which can take hours to converge or do the computations. As a user who would deploy things on servers or would mostly write python code outside notebooks, I would like to see only things in the output which I need to do something about. Like warnings, errors, etc. Very often we feed the outputs of our programs to a log server (like an elasticsearch or something) to later analyze the output and the warnings etc. That's why by default none of those methods print anything.

Now in terms of how to show the progress, I would very much prefer using logging. It's very configurable and the user can decide where to send the logs from each module, or to silence them. The tqdm is not essential to run any of the algorithms in fairlearn and therefore it really shouldn't be a hard dependency. Adding hard dependencies makes a minimal container which would run a piece of code in production unnecessarily heavy.

So ideally, I'd have a solution which has verbose=0 by default, or logging setup or however else you'd like to have it configured, use logging to show the progress by default, and have a fairlearn level configuration which would allow the user to enable usage of tqdm. When the user explicitly sets that configuration variable, we also check if tqdm is installed and error and tell the user to install it if it's not installed already. You could also have the dependencies setup so that fairlearn[extended] would also install tqdm, for example.

In general, it's totally fine to add dependencies when working on one's own script, but they should not be taken lightly when it comes to library development.

@romanlutz
Copy link
Member

@adrinjalali I think all your points are valid. I looked a little bit into how this is done in scikit-learn, and to be honest I was surprised to see print all over the place as opposed to using logging. What am I missing?

https://github.com/scikit-learn/scikit-learn/blob/9acfaab9667c038686ef51881adce72721ede377/sklearn/model_selection/_validation.py

Copy link
Member

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adrin made some very valid points. Perhaps best to see whether we can get into a similar pattern to scikit-learn without introducing new dependencies.

@adrinjalali
Copy link
Member

@adrinjalali I think all your points are valid. I looked a little bit into how this is done in scikit-learn, and to be honest I was surprised to see print all over the place as opposed to using logging. What am I missing?

https://github.com/scikit-learn/scikit-learn/blob/9acfaab9667c038686ef51881adce72721ede377/sklearn/model_selection/_validation.py

Mostly for historical reasons I guess. And we know it's bad and we're trying to fix it:

Recent proposal:
scikit-learn/scikit-learn#17439
And the issue goes back to 2011:
scikit-learn/scikit-learn#78

@adrinjalali
Copy link
Member

Also related to tqdm: scikit-learn/scikit-learn#7574 (comment)

@riedgar-ms
Copy link
Member

I agree that the output should be an option, and I can see the sense in making any such dependency 'soft'. That bit about threading and tqdm is scary (a flashback to when I first learned about the GIL). Having just done a quick prod of my current conda environment, the reason we have tqdm available right now is because of shap and papermill both of which are on our hitlist of dependencies to remove.

@romanlutz romanlutz mentioned this pull request Aug 15, 2020
Base automatically changed from master to main February 6, 2021 06:05
@romanlutz
Copy link
Member

Closing this PR since we decided not to go with tqdm.

@romanlutz romanlutz closed this Mar 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants