Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH relaxed fairness constraint fulfillment via postprocessing (currently only supports strict fulfillment) #1246

Open
AndreFCruz opened this issue Jun 20, 2023 · 4 comments · May be fixed by #1248
Assignees
Labels
API Anything which touches on the API enhancement New feature or request

Comments

@AndreFCruz
Copy link
Contributor

Hi fairlearn maintainers,

We’ve implemented a relaxation for the common postprocessing (threshold optimization) fairness method, that essentially enables optimizing for some loss (e.g., accuracy) with a maximum difference between groups’ TPRs or FPRs.

The implementation is self-contained and available at https://github.com/andrefcruz/error-parity

There are also several example notebooks in the “examples” folder.

What would be a road map to have this included into fairlearn?

I guess the main options would be:

  1. Changing the ThresholdOptimizer class;
    • Add an optional argument tolerance: float = None, and whenever that argument is passed the relaxed implementation would be used?
    • If no tolerance is passed, the previous thresholding implementation would be used (no need to run the LP solver).
  2. Adding a new RelaxedThresholdOptimizer class;
    • Advantages: makes it clear that the backend and algorithm is different than ThresholdOptimizer (as the relaxation implies solving a linear program).
    • Disadvantages: potentially creates clutter in the API.

Other thoughts/questions:

  • If going with option (1), do you think it'd be a simple if/else branch that falls back to the original implementation whenever tolerance=None (the default)?
  • Currently only a relaxation on equalized odds (equal group-specific TPR and FPR) has been implemented. This is of course much less than the current ThresholdOptimizer class functionality; and the remaining functionality could be added in the near future.
    • Does this change the calculus between using the same ThresholdOptimizer class or a separate RelaxedThresholdOptimizer? Or can we just throw NotImplementedError whenever a specific unimplemented combination of kwargs is requested?

Thanks and all the best,
André


Is your feature request related to a problem? Please describe.

Currently it is only possible to achieve exact equality between group-specific TPR or FPR (or both). Having a relaxed constraint fulfillment can enable direct head-to-head comparison between different fair ML methods at the same level of fairness. Additionally, relaxed constraint fulfillment also enables us to map the whole fairness-accuracy Pareto front for a single given classifier.

Describe the solution you'd like

The solution for relaxed equalized odds has been implemented at https://github.com/AndreFCruz/error-parity/blob/main/error_parity/cvxpy_utils.py#L159

It describes the problem as an LP, and relies on cvxpy to solve it. Then, up to three realized ROC points are used to triangulate to target ROC point (which can be in the interior of the ROC curve, and hence require randomization of predictions).

@romanlutz romanlutz added enhancement New feature or request API Anything which touches on the API labels Jun 23, 2023
@romanlutz
Copy link
Member

Thank you for elaborating so well on the options!

I'll have a bunch of observations in random order. I'm just numbering them so that it's easier to respond.

  1. Regarding having a single ThresholdOptimizer class with a new tolerance arg vs. adding another RelaxedThresholdOptimizer class I think this is mostly a question of taste, so people may or may not agree. In the "single class" case we'd just have to make sure that the outputs match between relaxed and strict which would be a little bit of work. IMO we should make sure of that regardless, but if it's a single class it would be weird if the outputs don't match from a UX point of view since users don't particularly care that there are two different execution paths under the hood.
    a) @MiroDudik also proposed a different way to achieve a relaxed version of threshold optimizer. If there's any chance we might want to implement that later on it might make more sense to put these relaxed versions into their own classes to avoid confusion. I suppose @MiroDudik can comment on that.
    b) Re "outputs match" - In ThresholdOptimizer we construct an InterpolatedThresholder internally (

    self.interpolated_thresholder_ = threshold_optimization_method(
    ) that seems similar to your RandomizedClassifier. There might (?) be an opportunity to consolidate so that the visible outputs match, but I'm curious what you think.

  2. I don't think it's a problem that the relaxation only support equalized odds at this point. It is still possible to expand to other constraints, right? If so, we should open issues for other constraints with a description of how to do it once the initial PR is merged. That way, other contributors could pick up the work. In the meantime, we should just return an error saying that with tolerance != None the only supported constraint as of now is equalized odds.

  3. Speaking of which... shouldn't the default tolerance be 0 instead of None? Just a small nitpick :-)

  4. cvxpy is not currently a requirement of fairlearn. Historically, we've been trying to avoid adding every dependency as mandatory. Examples include pytorch, tensorflow, and matplotlib. In the case where someone doesn't have it installed but is trying to use it we'd just tell them to install it and throw an error. Example for matplotlib (although the [customplots] extra is not the best example... best to just tell what packages to install directly) is

    _MATPLOTLIB_IMPORT_ERROR_MESSAGE = (

  5. There's typically a requirement to write documentation for new additions to Fairlearn. That means

  • API reference (which you have, so that shouldn't be a problem)
  • user guide (e.g., https://fairlearn.org/main/user_guide/mitigation/index.html) which provides a space to explain to users what this method is for, how it relates to other mitigations, caveats, etc.; essentially information that would be great to know that doesn't quite fit into the API reference. You will find that the threshold optimizer itself doesn't have a user guide yet. I have an open PR Add user guide for fairlearn.postprocessing.ThresholdOptimizer #614 that I am very eager to complete (although I don't have a particular target date). For that reason, I'm reluctant to say that this is a fixed requirement for your PR. It would be nice if you're up for writing this perhaps at a later point, of course.
  • example notebooks: we can check if any of the existing ones work for this purpose. We try to examine the sociotechnical context for our example notebooks which tends to be a larger endeavor, so it's probably out of scope for this PR.
  1. In the PR I'm sure our linter will throw a bunch of errors that are easy to resolve, plus I'll have some suggestions on rewording/rewriting comments or variable names, but that'll be super simple. In Fairlearn, groups can be named with strings or arbitrary numbers, so we'll have to make an adjustment for that. For example, we could add a mapping from input names to the group indices you're using which would allow this to work without any modifications to your logic.

  2. We'll have to look into the allowed input data types. In Fairlearn, we allow pandas DataFrame (and Series where appropriate), numpy ndarray (which you're using), and technically also lists. Since it would be a lot of work to write the code for each of them, we have some logic to transform it to a single format and work with that. In ThresholdOptimizer you can see that here (and following lines):

    _, _, sensitive_feature_vector, _ = _validate_and_reformat_input(

    We'll need to do the same for your code to allow pandas DataFrame etc. Since the format we're using internally is ndarray (as far as I can tell from memory and a quick glance) this should also not be a huge change as that's what you're doing already.

  3. We also have some plotting functionality for ThresholdOptimizer. Since we can't assume that people have matplotlib (or in your case seaborn) installed, we treat that as a soft dependency as well (

    ). It should be simple to do something similar.

  4. I see that your classifier classes all implement __call__. Fairlearn complies with the sklearn convention of estimators and predictors, so this would require rewriting it. It's really not too difficult, though, assuming I'm not missing a really good reason for why it was written this way in the first place 😄

  5. Side note: having a random seed set as default (as in RandomizedClassifier) is probably not a great idea. We also provide it as a means to reproduce results (particularly in tests), but it should really be (truly) randomized when users run it.

11.Assertions in the code should probably become if-blocks with raising exceptions since they wouldn't be super useful for users if they encounter them.

  1. I'm impressed with how many comments and type annotations you have. This is some of the highest quality research code I've seen 😄

  2. We should probably figure out what's going on
    a) here:
    https://github.com/AndreFCruz/error-parity/blob/be609050339bf5ab526fdce24c7aa90d5dde911c/error_parity/classifiers.py#L374
    b) and here:
    https://github.com/AndreFCruz/error-parity/blob/be609050339bf5ab526fdce24c7aa90d5dde911c/error_parity/roc_utils.py#L139

  3. It's great that there are already tests. Still, the test coverage likely has considerable potential for improvement. I can make recommendations if you're not sure what I mean.

  4. Looking at the API, ThresholdOptimizer has the following args:

estimator=None,
constraints="demographic_parity",
objective="accuracy_score",
grid_size=1000,
flip=False,
prefit=False,
predict_method="deprecated",

while RelaxedEqualOdds has

predictor=predictor,
tolerance=EPSILON_TOLERANCE,
false_pos_cost=FALSE_POS_COST,
false_neg_cost=FALSE_NEG_COST,
max_roc_ticks=1000,
seed=SEED,

Predictor/estimator are basically the same here, except that we allow passing in an untrained estimator that ThresholdOptimizer will then train. The prefit arg indicates whether it is prefit, although we do checks for that as well. I suppose it makes sense to do the same for the relaxed version. Your constraint is currently always "equalized_odds", so it makes sense that it's not mentioned at this point. We would probably want to keep the same constraints arg in the API, though.
ThresholdOptimizer also has an objective arg that allows specifying objectives other than accuracy, e.g., balanced accuracy, see

objective : str, default='accuracy_score'

We can certainly start by just supporting accuracy_score for now and make other objectives a future item.
I believe max_roc_ticks and grid_size are equivalent.
We don't expose the seed on the TO constructor, but rather on the predict method since that's where the randomness comes in. Ours is called random_state for consistency with sklearn I believe.
random_state : int or :class:`numpy.random.RandomState` instance, default=None

Our predict_method arg indicates which method to call on the (trained) estimator to get scores. This solves the messy mapping one has to do otherwise. I believe you're familiar with this, and I've seen it in one of your notebooks, too:

predictor = lambda X: rf_clf.predict_proba(X)[:, -1]

I could have sworn we support costs for FP and FNs, but upon checking the code it seems like that's only in our reductions methods. @miro have we considered this for postprocessing? This seems to be the only gap I can identify here. There seems to be an (fairly old) issue asking for just this, too: #473


Summary:

@fairlearn/fairlearn-maintainers: @MiroDudik @riedgar-ms @AndreFCruz and I discussed this yesterday on the community call. We all agreed that it makes sense to add. @hildeweerts suggested the same via email. I will be on point as the responsible maintainer to respond (just to make sure this doesn't catch dust 😄 ), but obviously every perspective is valuable and encouraged.
I don't see any major blockers that would hold up a PR. We just need to make decisions on some of the points I raised (particularly the first and last one) and there's a fair amount of compatibility tasks to work through (e.g., input format, output consistency, linting, etc.) as well as tests.

@romanlutz romanlutz changed the title Suggested addition: relaxed fairness constraint fulfillment via postprocessing (currently only supports strict fulfillment) ENH relaxed fairness constraint fulfillment via postprocessing (currently only supports strict fulfillment) Jun 23, 2023
@AndreFCruz
Copy link
Contributor Author

AndreFCruz commented Jun 26, 2023

Thanks for the thorough feedback Roman! All points sound very reasonable. A few specific questions:

  1. I don't see an easy solution to implement it as a single-class (without a bunch of ugly if/else statements in every method).

Can we implement the Strict and Relaxed logic in different classes and add them to the MRO of the ThresholdOptimizer object dynamically in the constructor (at runtime)? Or is there a better solution? @MiroDudik mentioned some possible MixIn based solution (?)

  1. I don't like having equality comparisons with floating point numbers but you're probably right! In Python using 0 is probably fine. Using None could also be more of a signal that using this parameter makes use of the cvxpy LP solution, and allows for the use of tolerance=0 with the cvxpy solver still.

.15. I can use this fairlearn mapping for objective functions, but it will reduce functionality. Instead of allowing for any FP and FN cost, only allows for 1-to-1 costs (accuracy) or costs normalized by size (balanced accuracy).

@romanlutz
Copy link
Member

Re 1:
To clarify, I think the implementations will be separate regardless. The question is whether the user-facing API is unified via ThresholdOptimizer or whether we have two classes. I don't really care about a few additional if/else statements to be honest. The user experience is more important, and users don't interact with these if/else statements. What you wrote later on "dynamically in the constructor (at runtime)" seems to match what I'm describing. Did you previously assume that single class means the implementation logic needs to be unified? Because that's not something @MiroDudik or I wanted either (and it would be a lot of work).

Re 3:

I don't like having equality comparisons with floating point numbers
Fair. Not a huge concern either way so just ignore and I'll see what it looks like in the PR 🙂

Re 15:
I would like to hear @MiroDudik's opinion on this before we make a decision since it is quite a restriction as you explained, so we shouldn't do that lightly. I guess that's another advantage of not having a single class: we can expand the API without as many complications!

@MiroDudik
Copy link
Member

Re 3: I am actually in favor of implementing tolerance = None as a default, or possibly use a string literal like "exact" or something like that. That way we'll know that if a user is overriding the default they intend to invoke _RelaxedThresholdOptimizer (or however we route the selective constructor).

Re 15: I definitely think that we should allow for different costs. This is something that we discussed previously:

For API, we should go with option (2) in the above issue, so using objective_costs as an optional argument. This will harmonize with an analogous API here:
https://fairlearn.org/v0.8/api_reference/fairlearn.reductions.html#fairlearn.reductions.ErrorRate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Anything which touches on the API enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants