ENH add mitigation algorithm "Mechanisms for Fair Classification" by Zafar et al. #1043

rensoostenbach · 2022-03-14T11:35:18Z

This PR will solve #1025 and implements the Logistric Regression from the paper Fairness Constraints: Mechanisms for Fair Classification by Zafar et al.

Assumptions made so far:

We most likely want the API design to be similar to sklearn.linear_model.LogisticRegression. I have also seen the discussion in Proposal: Port EqualOpportunityClassifier and DemographicParityClassifier #466, and have taken it into account regarding the naming.
The code should work when the feature data input is either a Numpy Array or a Pandas DataFrame, and for sensitive features we support the former two as well as lists and Pandas Series.
In the code of the paper, it is possible to supply a separate constraint threshold per sensitive attribute, or even per category of a sensitive attribute. I have implemented a threshold per sensitive feature, so not per category of a sensitive attribute.
I will only implement the SLSQP solver. All solvers implemented in sklearn do not work with constraints, and 'SLSQP' does while also being available in the optimize.minimize function, making it an easy choice.

Progress:

technique code in fairlearn.linear_model
unit tests in test.unit.linear_model --> Some relatively simple tests are implemented. Open for test ideas if anyone feels like there is missing something.
descriptive API reference (directly in the docstring) --> Don't have a code example yet, but docstrings are included.
a short user guide in docs.user_guide.mitigation.rst

Once all this is finished, I can look to extend this to a new GLM framework in sklearn, as mentioned here.

Most of this code is copied from the sklearn LogisticRegression

Easy version for now, see comments in code

Returning renamed sensitive features, and not dropping a column in case of binary sensitive feature.

Some minor changes done by me, e.g. intercept on line 705. Still need to figure out how to do this in the right way

Not happy with this way of doing it, probably need to do it more like sklearn and less like the code in the paper.

Now implemented as in sklearn, instead of how the paper does it

…ensoostenbach/fairlearn into mechanisms_fair_classification

rensoostenbach · 2022-06-03T08:15:56Z

Could you please add tests to cover the lines which are not covered? Or let me know if they are but the code-cov config is not catching them.

@adrinjalali
Codecov is giving a warning on a docstring, line 4 in fairlearn/linear_model/init.py, as well as some imports (line 5, 10, 13, 17, 25 in fairlearn/linear_model/_constrained_logistic.py), and I am not sure what should be done with that.

Here is an overview of lines in _constrained_logistic.py that are giving warnings while they are covered in test_constrained_logistic.py:

Lines 28-31: Covered by test_wrong_solver (Or should I parametrize this test such that it tests for a wrong and a correct solver?)
Lines 36-38: Covered by test_wrong_penalty (Or should I parametrize this test such that it tests for a wrong and a correct penalty?)
Lines 46-49: Covered by test_wrong_multi_class (Or should I parametrize this test such that it tests for a wrong and a correct value for multi_class?)
Lines 82-83: Covered by test_mismatch_X_sf_rows
The rest of the warnings in _sensitive_attr_constraint_cov: I don't know what to do with those
Line 108: Covered by test_get_constraint_list_cov
The rest of the warnings in _get_constraint_list_cov: I don't know what to do with those
The warnings in _logistic_regression_path: This code is taken from the sklearn _logistic_regression_path function for a large part, how should that be handled test-wise?
Line 306: Covered by test_ohe_sensitive_features
The rest of the warnings in _ohe_sensitive_features: I don't know what to do with those
Lines 343-345 and 351-352: Covered by test_cov_bound_dict
The rest of the warnings in _process_covariance_bound_dict: I don't know what to do with those
Line 366: Covered by test_process_sensitive_features
Line 407: Covered by test_sf_wrong_type
The rest of the warnings in _process_sensitive_features: I don't know what to do with those
Warnings in the ConstrainedLogisticRegression class: A few are listed below that I feel like are covered, other parts of this code are from sklearn again.
Line 643-644 and 661: Covered by test_unconstrained_vs_normal_lr
Lines 675-676: Covered by test_too_many_cov_bound_values
Line 686: Covered by test_too_little_cov_bound_values

There are also some warnings that I am not sure what to do with, e.g. the warning on line 43. For that warning, should I implement a test that specifically tests for the right solver, or does that warning mean something else?

I realize this is quite a lot, perhaps we could chat sometime so I could get a better grasp of how codecov works and how I should tackle these warnings :).

adrinjalali

I think codecov is complaining because the tests are failing (on an import) and therefore the coverage file is not actually covering the new code at all. I'll look into it.

test/unit/linear_model/conftest.py

fairlearn/linear_model/__init__.py

test/unit/linear_model/test_constrained_logistic.py

adrinjalali · 2022-06-08T09:35:48Z

So there's been some re-work of the loss methods in scikit-learn=1.1, and your import from _logistic is not there anymore.

Check this PR for the changes: https://github.com/scikit-learn/scikit-learn/pull/21808/files

To fix this, you probably need a function in utils/fixes.py which would handle the different versions and inside your code you'd import that and use it.

rensoostenbach · 2022-06-08T13:41:02Z

@adrinjalali I've moded all content in conftest.py to the test file now, as well as importing it correctly.

Thanks for notifying about the change in sklearn! I was not aware of this at all. I've pushed a fix for this, however I'm not sure if it should be done like this since this is a first for me.

…ensoostenbach/fairlearn into mechanisms_fair_classification

hildeweerts · 2022-07-12T07:51:24Z

Hi @rensoostenbach - is it clear for you what remains to be done to get this PR merged? Let us know if you still have any questions :)

rensoostenbach · 2022-07-13T14:08:28Z

@hildeweerts I believe that I should perhaps write some more tests on the lines that codecov is failing on, but I am unsure for which parts of the code I should do that. On one hand, I feel like I have written some tests for parts of the code that codecov still wants me to write a test on, and I'm not sure if I should write tests for parts of the code that originate from sklearn on the other hand.

Also, I think this PR is still blocked by #1093, as per this comment.

rensoostenbach · 2022-08-04T10:15:47Z

@hildeweerts @adrinjalali

I noticed that when Hilde committed 5c32555, Codecov wasn't complaining anymore, but now it is again. I am still not really sure how to fix that in the best way.

Also, the PR gate is giving a weird error for flake8, but locally I only get the following flake8 errors:

fairlearn/linear_model/_constrained_logistic.py:609:1: RST306 Unknown target name: "classes". fairlearn/linear_model/_constrained_logistic.py:615:1: RST306 Unknown target name: "coef". fairlearn/linear_model/_constrained_logistic.py:620:1: RST306 Unknown target name: "intercept". fairlearn/linear_model/_constrained_logistic.py:623:1: RST306 Unknown target name: "n_features_in". fairlearn/linear_model/_constrained_logistic.py:627:1: RST306 Unknown target name: "feature_names_in". fairlearn/linear_model/_constrained_logistic.py:632:1: RST306 Unknown target name: "n_iter".

So it seems that the docs are still not recognizing attributes properly.

rensoostenbach and others added 30 commits March 7, 2022 15:31

Initial commit FairLogisticRegression

9eb6ff3

Most of this code is copied from the sklearn LogisticRegression

Code for OHE and splitting the data

7cfe1eb

Adding/removing imports

65a2392

Add functions from paper

0a207b8

Add function for constraints list

bd9b94c

Easy version for now, see comments in code

Update OHE function

90f5a0e

Returning renamed sensitive features, and not dropping a column in case of binary sensitive feature.

Proper indentation for class

e0278c6

Return sensitive feature ids

854be5b

Formatting and change of parameters

9d2b726

Remove getting constraints in _logistic_regression_path

f99d0d9

Update docstring comment

5d97edc

Formatting

a481dbe

Change minimization function and arguments

a0bd0c2

Update fit function with self written code

6a96e4d

Update fit function with code from sklearn

512995a

Some minor changes done by me, e.g. intercept on line 705. Still need to figure out how to do this in the right way

Change solver to SLSQP

2117ab2

Remove unused solver code

fe9cbf5

Positional arguments to keyword arguments

ce0a816

Add intercept based on user input

772e54d

Not happy with this way of doing it, probably need to do it more like sklearn and less like the code in the paper.

Add code for testing/debugging

250f427

Remove unused solver code

777bffd

Add URLs to the code of the paper

2753cf2

Add/edit/remove comments

b59c42d

Merge branch 'main' into mechanisms_fair_classification

d56dfe8

Docstring on one line

4bf3947

Run black

e863b22

Minor solver changes, removing unused code

26d90de

Use sklearn loss functions

3738437

Remove intercept code

3e28874

Now implemented as in sklearn, instead of how the paper does it

Remove more unused solver code

6964767

rensoostenbach and others added 7 commits June 2, 2022 12:05

Merge branch 'main' into mechanisms_fair_classification

fbad700

Merge branch 'mechanisms_fair_classification' of https://github.com/r…

4d6ccb0

…ensoostenbach/fairlearn into mechanisms_fair_classification

Support for dictionary, change type to isinstance

2e6e9a8

Remove whitespace

d1209a4

Add docstring

a3db81a

TypeError instead of ValueError

b68a0ee

Rephrasing documentation

fb63cf6

Remove wrong documentation text

e8a2ec9

adrinjalali reviewed Jun 8, 2022

View reviewed changes

test/unit/linear_model/conftest.py Outdated Show resolved Hide resolved

fairlearn/linear_model/__init__.py Show resolved Hide resolved

test/unit/linear_model/test_constrained_logistic.py Outdated Show resolved Hide resolved

rensoostenbach added 2 commits June 8, 2022 15:32

Fix for new logistic loss in sklearn 1.1

1ad5c98

Move conftest content, correct impor

3639c89

rensoostenbach and others added 10 commits June 8, 2022 15:44

Black and typo fix

4e40346

Merge branch 'main' into mechanisms_fair_classification

ef11948

Rewrite args for new sklearn version

4c76fc5

Remove old joblib import

628f7c2

Merge branch 'mechanisms_fair_classification' of https://github.com/r…

825add1

…ensoostenbach/fairlearn into mechanisms_fair_classification

First version of the user guide

a8c3370

Typos/URL fixes in documentation

6969903

Fix typo, add skips for doctest

af67d95

Add check optimize result for correct solver

ff76fba

Add sentence about trying multiple values

3c3acbf

hildeweerts and others added 3 commits July 26, 2022 12:06

Merge branch 'main' into mechanisms_fair_classification

5c32555

Flake8, add docstring

c8423ac

Merge branch 'main' into mechanisms_fair_classification

08a1b9c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH add mitigation algorithm "Mechanisms for Fair Classification" by Zafar et al. #1043

ENH add mitigation algorithm "Mechanisms for Fair Classification" by Zafar et al. #1043

rensoostenbach commented Mar 14, 2022 •

edited

rensoostenbach commented Jun 3, 2022

adrinjalali left a comment

adrinjalali commented Jun 8, 2022

rensoostenbach commented Jun 8, 2022

hildeweerts commented Jul 12, 2022

rensoostenbach commented Jul 13, 2022 •

edited

rensoostenbach commented Aug 4, 2022

ENH add mitigation algorithm "Mechanisms for Fair Classification" by Zafar et al. #1043

Are you sure you want to change the base?

ENH add mitigation algorithm "Mechanisms for Fair Classification" by Zafar et al. #1043

Conversation

rensoostenbach commented Mar 14, 2022 • edited

rensoostenbach commented Jun 3, 2022

adrinjalali left a comment

Choose a reason for hiding this comment

adrinjalali commented Jun 8, 2022

rensoostenbach commented Jun 8, 2022

hildeweerts commented Jul 12, 2022

rensoostenbach commented Jul 13, 2022 • edited

rensoostenbach commented Aug 4, 2022

rensoostenbach commented Mar 14, 2022 •

edited

rensoostenbach commented Jul 13, 2022 •

edited