Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH add mitigation algorithm "Mechanisms for Fair Classification" by Zafar et al. #1043

Open
wants to merge 92 commits into
base: main
Choose a base branch
from

Conversation

rensoostenbach
Copy link
Contributor

@rensoostenbach rensoostenbach commented Mar 14, 2022

This PR will solve #1025 and implements the Logistric Regression from the paper Fairness Constraints: Mechanisms for Fair Classification by Zafar et al.

Assumptions made so far:

  • We most likely want the API design to be similar to sklearn.linear_model.LogisticRegression. I have also seen the discussion in Proposal: Port EqualOpportunityClassifier and DemographicParityClassifier #466, and have taken it into account regarding the naming.
  • The code should work when the feature data input is either a Numpy Array or a Pandas DataFrame, and for sensitive features we support the former two as well as lists and Pandas Series.
  • In the code of the paper, it is possible to supply a separate constraint threshold per sensitive attribute, or even per category of a sensitive attribute. I have implemented a threshold per sensitive feature, so not per category of a sensitive attribute.
  • I will only implement the SLSQP solver. All solvers implemented in sklearn do not work with constraints, and 'SLSQP' does while also being available in the optimize.minimize function, making it an easy choice.

Progress:

  • technique code in fairlearn.linear_model
  • unit tests in test.unit.linear_model --> Some relatively simple tests are implemented. Open for test ideas if anyone feels like there is missing something.
  • descriptive API reference (directly in the docstring) --> Don't have a code example yet, but docstrings are included.
  • a short user guide in docs.user_guide.mitigation.rst

Once all this is finished, I can look to extend this to a new GLM framework in sklearn, as mentioned here.

rensoostenbach and others added 30 commits March 7, 2022 15:31
Most of this code is copied from the sklearn LogisticRegression
Easy version for now, see comments in code
Returning renamed sensitive features, and not dropping a column in case of binary sensitive feature.
Some minor changes done by me, e.g. intercept on line 705. Still need to figure out how to do this in the right way
Not happy with this way of doing it, probably need to do it more like sklearn and less like the code in the paper.
Now implemented as in sklearn, instead of how the paper does it
@rensoostenbach
Copy link
Contributor Author

Could you please add tests to cover the lines which are not covered? Or let me know if they are but the code-cov config is not catching them.

@adrinjalali
Codecov is giving a warning on a docstring, line 4 in fairlearn/linear_model/init.py, as well as some imports (line 5, 10, 13, 17, 25 in fairlearn/linear_model/_constrained_logistic.py), and I am not sure what should be done with that.

Here is an overview of lines in _constrained_logistic.py that are giving warnings while they are covered in test_constrained_logistic.py:

  • Lines 28-31: Covered by test_wrong_solver (Or should I parametrize this test such that it tests for a wrong and a correct solver?)
  • Lines 36-38: Covered by test_wrong_penalty (Or should I parametrize this test such that it tests for a wrong and a correct penalty?)
  • Lines 46-49: Covered by test_wrong_multi_class (Or should I parametrize this test such that it tests for a wrong and a correct value for multi_class?)
  • Lines 82-83: Covered by test_mismatch_X_sf_rows
  • The rest of the warnings in _sensitive_attr_constraint_cov: I don't know what to do with those
  • Line 108: Covered by test_get_constraint_list_cov
  • The rest of the warnings in _get_constraint_list_cov: I don't know what to do with those
  • The warnings in _logistic_regression_path: This code is taken from the sklearn _logistic_regression_path function for a large part, how should that be handled test-wise?
  • Line 306: Covered by test_ohe_sensitive_features
  • The rest of the warnings in _ohe_sensitive_features: I don't know what to do with those
  • Lines 343-345 and 351-352: Covered by test_cov_bound_dict
  • The rest of the warnings in _process_covariance_bound_dict: I don't know what to do with those
  • Line 366: Covered by test_process_sensitive_features
  • Line 407: Covered by test_sf_wrong_type
  • The rest of the warnings in _process_sensitive_features: I don't know what to do with those
  • Warnings in the ConstrainedLogisticRegression class: A few are listed below that I feel like are covered, other parts of this code are from sklearn again.
  • Line 643-644 and 661: Covered by test_unconstrained_vs_normal_lr
  • Lines 675-676: Covered by test_too_many_cov_bound_values
  • Line 686: Covered by test_too_little_cov_bound_values

There are also some warnings that I am not sure what to do with, e.g. the warning on line 43. For that warning, should I implement a test that specifically tests for the right solver, or does that warning mean something else?

I realize this is quite a lot, perhaps we could chat sometime so I could get a better grasp of how codecov works and how I should tackle these warnings :).

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think codecov is complaining because the tests are failing (on an import) and therefore the coverage file is not actually covering the new code at all. I'll look into it.

test/unit/linear_model/conftest.py Outdated Show resolved Hide resolved
fairlearn/linear_model/__init__.py Show resolved Hide resolved
test/unit/linear_model/test_constrained_logistic.py Outdated Show resolved Hide resolved
@adrinjalali
Copy link
Member

So there's been some re-work of the loss methods in scikit-learn=1.1, and your import from _logistic is not there anymore.

Check this PR for the changes: https://github.com/scikit-learn/scikit-learn/pull/21808/files

To fix this, you probably need a function in utils/fixes.py which would handle the different versions and inside your code you'd import that and use it.

@rensoostenbach
Copy link
Contributor Author

@adrinjalali I've moded all content in conftest.py to the test file now, as well as importing it correctly.

Thanks for notifying about the change in sklearn! I was not aware of this at all. I've pushed a fix for this, however I'm not sure if it should be done like this since this is a first for me.

@hildeweerts
Copy link
Contributor

Hi @rensoostenbach - is it clear for you what remains to be done to get this PR merged? Let us know if you still have any questions :)

@rensoostenbach
Copy link
Contributor Author

rensoostenbach commented Jul 13, 2022

@hildeweerts I believe that I should perhaps write some more tests on the lines that codecov is failing on, but I am unsure for which parts of the code I should do that. On one hand, I feel like I have written some tests for parts of the code that codecov still wants me to write a test on, and I'm not sure if I should write tests for parts of the code that originate from sklearn on the other hand.

Also, I think this PR is still blocked by #1093, as per this comment.

@rensoostenbach
Copy link
Contributor Author

@hildeweerts @adrinjalali

I noticed that when Hilde committed 5c32555, Codecov wasn't complaining anymore, but now it is again. I am still not really sure how to fix that in the best way.

Also, the PR gate is giving a weird error for flake8, but locally I only get the following flake8 errors:

fairlearn/linear_model/_constrained_logistic.py:609:1: RST306 Unknown target name: "classes". fairlearn/linear_model/_constrained_logistic.py:615:1: RST306 Unknown target name: "coef". fairlearn/linear_model/_constrained_logistic.py:620:1: RST306 Unknown target name: "intercept". fairlearn/linear_model/_constrained_logistic.py:623:1: RST306 Unknown target name: "n_features_in". fairlearn/linear_model/_constrained_logistic.py:627:1: RST306 Unknown target name: "feature_names_in". fairlearn/linear_model/_constrained_logistic.py:632:1: RST306 Unknown target name: "n_iter".

So it seems that the docs are still not recognizing attributes properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants