Display progress bar for grid search and exponentiated gradient #517

arjsingh · 2020-07-16T21:29:09Z

No description provided.

arjsingh · 2020-07-16T21:33:46Z

The progress bar is shown below by highlighting the difference between what we have now and what we intend to have:

Current grid-search:

grid-search after latest changes:

Current exponentiated-gradient:

exponentiated-gradient after latest changes:

romanlutz

Looks great. I'll send you some instructions for DCO for the future.

romanlutz · 2020-07-16T21:40:57Z

CHANGES.md

@@ -41,6 +41,7 @@
 * Add new constraints and objectives in `ThresholdOptimizer`
 * Add class `InterpolatedThresholder` to represent the fitted `ThresholdOptimizer`
 * Add `fairlearn.datasets` module.
+* Display progress bar for grid search and exponentiated gradient


nit: GridSearch and ExponentiatedGradient

Signed-off-by: Arjun Singh <arjsingh@microsoft.com>

adrinjalali · 2020-07-17T08:22:24Z

I'd be happier if we add an option like verbose and only have the progress bar if it's set. If it were me I'd also make tqdm a soft dependency. By default there should be zero stdout/stderr output when fitting any estimator IMO unless a warning or an error/exception happens.

romanlutz · 2020-07-17T17:51:03Z

I'd be happier if we add an option like verbose and only have the progress bar if it's set. If it were me I'd also make tqdm a soft dependency. By default there should be zero stdout/stderr output when fitting any estimator IMO unless a warning or an error/exception happens.

@adrinjalali this is interesting, thanks for sharing! We've had a number of people ask for this because the refitting over so many iterations can take time and it's not necessarily clear whether the notebook died in the meanwhile. Is there a downside to showing that output? If not, I feel tempted to say the default is to show it.

By soft dependency do you mean to use it if it's available but not otherwise (try-except), or do you mean having an extension of fairlearn, say pip install fairlearn[progress] that installs tqdm? I'm a bit averse to extensions since few people tend to use them (at least in my experience), although the try-except version is even worse that way since there's no particular way to notify users that they should install tqdm to get the progress bar.

adrinjalali · 2020-07-19T08:10:43Z

We've had a number of people ask for this because the refitting over so many iterations can take time and it's not necessarily clear whether the notebook died in the meanwhile.

And that's why it's a good idea to have verbose as a feature :D

It's the same with sklearn's *SearchCV or any other model for that matter which can take hours to converge or do the computations. As a user who would deploy things on servers or would mostly write python code outside notebooks, I would like to see only things in the output which I need to do something about. Like warnings, errors, etc. Very often we feed the outputs of our programs to a log server (like an elasticsearch or something) to later analyze the output and the warnings etc. That's why by default none of those methods print anything.

Now in terms of how to show the progress, I would very much prefer using logging. It's very configurable and the user can decide where to send the logs from each module, or to silence them. The tqdm is not essential to run any of the algorithms in fairlearn and therefore it really shouldn't be a hard dependency. Adding hard dependencies makes a minimal container which would run a piece of code in production unnecessarily heavy.

So ideally, I'd have a solution which has verbose=0 by default, or logging setup or however else you'd like to have it configured, use logging to show the progress by default, and have a fairlearn level configuration which would allow the user to enable usage of tqdm. When the user explicitly sets that configuration variable, we also check if tqdm is installed and error and tell the user to install it if it's not installed already. You could also have the dependencies setup so that fairlearn[extended] would also install tqdm, for example.

In general, it's totally fine to add dependencies when working on one's own script, but they should not be taken lightly when it comes to library development.

romanlutz · 2020-07-19T22:12:43Z

@adrinjalali I think all your points are valid. I looked a little bit into how this is done in scikit-learn, and to be honest I was surprised to see print all over the place as opposed to using logging. What am I missing?

https://github.com/scikit-learn/scikit-learn/blob/9acfaab9667c038686ef51881adce72721ede377/sklearn/model_selection/_validation.py

romanlutz

Adrin made some very valid points. Perhaps best to see whether we can get into a similar pattern to scikit-learn without introducing new dependencies.

adrinjalali · 2020-07-20T10:20:39Z

@adrinjalali I think all your points are valid. I looked a little bit into how this is done in scikit-learn, and to be honest I was surprised to see print all over the place as opposed to using logging. What am I missing?

https://github.com/scikit-learn/scikit-learn/blob/9acfaab9667c038686ef51881adce72721ede377/sklearn/model_selection/_validation.py

Mostly for historical reasons I guess. And we know it's bad and we're trying to fix it:

Recent proposal:
scikit-learn/scikit-learn#17439
And the issue goes back to 2011:
scikit-learn/scikit-learn#78

adrinjalali · 2020-07-20T10:23:26Z

Also related to tqdm: scikit-learn/scikit-learn#7574 (comment)

riedgar-ms · 2020-07-21T13:53:43Z

I agree that the output should be an option, and I can see the sense in making any such dependency 'soft'. That bit about threading and tqdm is scary (a flashback to when I first learned about the GIL). Having just done a quick prod of my current conda environment, the reason we have tqdm available right now is because of shap and papermill both of which are on our hitlist of dependencies to remove.

romanlutz · 2021-03-26T06:55:22Z

Closing this PR since we decided not to go with tqdm.

romanlutz approved these changes Jul 16, 2020

View reviewed changes

Display progress bar for grid search and exponentiated gradient

5f0a6d1

Signed-off-by: Arjun Singh <arjsingh@microsoft.com>

arjsingh force-pushed the add_tqdm branch from d2b3dd8 to 5f0a6d1 Compare July 16, 2020 23:14

romanlutz requested changes Jul 19, 2020

View reviewed changes

romanlutz mentioned this pull request Aug 15, 2020

510 progress bar #559

Closed

Base automatically changed from master to main February 6, 2021 06:05

romanlutz mentioned this pull request Mar 26, 2021

Show progress during unfairness mitigation #510

Open

romanlutz closed this Mar 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Display progress bar for grid search and exponentiated gradient #517

Display progress bar for grid search and exponentiated gradient #517

arjsingh commented Jul 16, 2020

arjsingh commented Jul 16, 2020

romanlutz left a comment

romanlutz Jul 16, 2020

adrinjalali commented Jul 17, 2020

romanlutz commented Jul 17, 2020

adrinjalali commented Jul 19, 2020

romanlutz commented Jul 19, 2020

romanlutz left a comment

adrinjalali commented Jul 20, 2020

adrinjalali commented Jul 20, 2020

riedgar-ms commented Jul 21, 2020

romanlutz commented Mar 26, 2021

Display progress bar for grid search and exponentiated gradient #517

Display progress bar for grid search and exponentiated gradient #517

Conversation

arjsingh commented Jul 16, 2020

arjsingh commented Jul 16, 2020

romanlutz left a comment

Choose a reason for hiding this comment

romanlutz Jul 16, 2020

Choose a reason for hiding this comment

adrinjalali commented Jul 17, 2020

romanlutz commented Jul 17, 2020

adrinjalali commented Jul 19, 2020

romanlutz commented Jul 19, 2020

romanlutz left a comment

Choose a reason for hiding this comment

adrinjalali commented Jul 20, 2020

adrinjalali commented Jul 20, 2020

riedgar-ms commented Jul 21, 2020

romanlutz commented Mar 26, 2021