ENH Optimized Preprocessing Algorithm nips 2016 #1340

rakesh9177 · 2024-01-29T15:30:35Z

Description

closes #1028

Tests

no new tests required
new tests added
existing tests adjusted

Documentation

no documentation changes needed
user guide added or updated
API docs added or updated
example notebook added or updated

Screenshots

romanlutz

Thanks @rakesh9177 for submitting such an extensive pull request! I personally don't have the time to do an in-depth review at this time, but hopefully someone else from @fairlearn/fairlearn-maintainers does. Left a bunch of higher-level and nitpicky comments that should be fairly easy to address.

romanlutz · 2024-01-30T00:09:52Z

fairlearn/preprocessing/_optimPreproc.py

+from sklearn.base import BaseEstimator, TransformerMixin
+
+
+class OptimizedPreprocessor:


I get that this name is based on the AIF360 implementation, but I don't like it because it's not at all descriptive. From a glance at the abstract (I haven't read the full paper), perhaps something like discrimination-reducing preprocessor is possible? Not sure if I like that much more 😆 @fairlearn/fairlearn-maintainers please chime in.

I would not go for something like "discrimination-reducing" as that could imply the pre-processing algorithm maps neatly upon legal notions of discrimination.

The technique aims to identify a randomized mapping that minimizes the difference between the original distribution of X and Y and the distribution of the pre-processed data ("utility"), subject to a demographic parity ratio constraint ("discrimination control") and an expected individual "distortion" constraint ("distortion control"). The last component determines which transformations are penalized (e.g. swapping labels, changing particular features, etc.). So perhaps a name related to demographic parity and individual distortions would make sense?

fairlearn/preprocessing/_optimPreproc.py

romanlutz · 2024-01-30T00:11:55Z

fairlearn/preprocessing/_optimPreproc.py

+        self.optim_options = optim_options
+        self.verbose = verbose
+
+    def fit(self, df, X_features, Y_features, D_features):


This isn't quite in line with our usual API for fit (see any other technique in our repo). I'd prefer keeping it consistent.

In particular, have a look at our other preprocesing algorithm, CorrelationRemover.

Although I think we could also simply have something like .fit(X, y, sensitive_features) rather than __init__(sensitive_feature_ids) - does anybody recall why we went for sensitive_feature_idsat initialization in CorrelationRemover? Tagging @fairlearn/fairlearn-maintainers

I can use a similar approach and it should not take much time to modify.

There are two ways Correlation remover is working right now-

Using numpy arrays, since this does not have feature names, index has to be passed
2)When a pandas dataframe is passed, it can take advantage of column names

Both are being passed using sensitive_feature_ids

signature should be def fit(self, X, y, *kwargs): where *kwargs are sensitive features.

We also should take anything, not just a dataframe.

Apologies for the delayed response since I had some interviews!

I changed the code according to scikit documentation - https://scikitlearn.org/stable/modules/generated/sklearn.base.TransformerMixin.html

Please review @adrinjalali

fairlearn/preprocessing/_optimPreproc.py

fairlearn/preprocessing/_optimPreproc_helper.py

romanlutz · 2024-01-30T00:14:01Z

fairlearn/preprocessing/_optimPreproc_helper.py

+                constraints.append(PYhgD[d2, :].T <= PYhgD[d, :].T * (1 + self.epsilon))
+
+        # Mean distortion
+        # Pxy_xhyhJoint = (self.dfMask_Pxyd_to_Pxy.values.T).dot(np.diag(PxydMarginal))*Pmap


Can this commented out code be removed?

romanlutz · 2024-01-30T00:14:48Z

requirements.txt

@@ -6,3 +6,5 @@ pandas>=2.0.3
 pyarrow>=15
 scikit-learn>=1.2.1
 scipy>=1.9.3
+cvxpy


This may be a problem... I believe this has come up in the past, too. I wouldn't mind it as a soft dependency. Paging @fairlearn/fairlearn-maintainers 🙂

test/unit/preprocessing/Discrimation_remover_optimized_preprocess/test_optimPreproc.py

hildeweerts

Hi!
I've only been able to skim the code for now. A few high-level comments:

Could you add more comments and class/function descriptions to indicate what the different pieces of code are doing? That would make it much easier to review! :)
Please try to make sure the API corresponds to fairlearn/scikit-learn conventions as much as possible.
The current implementation relies heavily on Pandas dataframes, which might not be ideal.
The distortion function seems to be a crucial, but also quite tricky, part of this method. From what I understand so far, it needs to be explicitly defined for a particular dataset. It's not immediately clear to me what would be the best way to implement this in a user-friendly manner.

hildeweerts · 2024-01-30T14:43:04Z

fairlearn/preprocessing/_optimPreproc.py

+        self.optim_options = optim_options
+        self.verbose = verbose
+
+    def fit(self, df, X_features, Y_features, D_features):


In particular, have a look at our other preprocesing algorithm, CorrelationRemover.

Although I think we could also simply have something like .fit(X, y, sensitive_features) rather than __init__(sensitive_feature_ids) - does anybody recall why we went for sensitive_feature_idsat initialization in CorrelationRemover? Tagging @fairlearn/fairlearn-maintainers

fairlearn/preprocessing/_optimPreproc.py

hildeweerts · 2024-01-30T15:23:40Z

fairlearn/preprocessing/_optimPreproc.py

+        self.opt.compute_marginals()
+        """
+
+    def transform(self, df, X_features, Y_features, D_features, transform_Y=False):


Can you explain the purpose of transform_Y?

When features are being transformed, there is an option if labels(Y_features) needs to transformed along with data. If a user wishes to transform labels as well along with X features, user can make transform_Y=True.

adrinjalali

I started reviewing the API, and I see it's far from a scikit-learn compatible API. We should fix that first before merging.

Also, please apply black to your new files.

adrinjalali · 2024-02-06T17:57:20Z

fairlearn/preprocessing/_optimPreproc.py

+    and objectives [3]_.
+
+    References:
+        .. [3] F. P. Calmon, D. Wei, B. Vinzamuri, K. Natesan Ramamurthy, and


why 3 and not [1]

adrinjalali · 2024-02-06T17:58:09Z

fairlearn/preprocessing/_optimPreproc.py

+from sklearn.base import BaseEstimator, TransformerMixin
+
+
+class OptimizedPreprocessor:


should inhering from BaseEstimator and TransformerMixin

Adjusted the code according to -
https://scikit-learn.org/stable/modules/generated/sklearn.base.TransformerMixin.html

adrinjalali · 2024-02-06T17:58:27Z

fairlearn/preprocessing/_optimPreproc.py

+        self.optim_options = {
+            "distortion_fun": distortion_function,
+            "epsilon": epsilon,
+            "clist": clist,
+            "dlist": dlist,
+        }


__init__ should only store given input values and nothing else.

adrinjalali · 2024-02-06T17:59:35Z

fairlearn/preprocessing/_optimPreproc.py

+        self.optim_options = optim_options
+        self.verbose = verbose
+
+    def fit(self, df, X_features, Y_features, D_features):


signature should be def fit(self, X, y, *kwargs): where *kwargs are sensitive features.

We also should take anything, not just a dataframe.

rakesh9177 and others added 2 commits January 24, 2024 19:54

optimizer algorithm nips 2016

8f85413

Merge branch 'main' into main

42dfb9f

rakesh9177 mentioned this pull request Jan 29, 2024

ENH Add mitigation algorithm from "Optimized Pre-Processing for Discrimination Prevention" by Calmon et al. #1028

Open

4 tasks

rakesh9177 added 2 commits January 29, 2024 18:24

updated docs

887a570

all

3f29f25

romanlutz linked an issue Jan 30, 2024 that may be closed by this pull request

ENH Add mitigation algorithm from "Optimized Pre-Processing for Discrimination Prevention" by Calmon et al. #1028

Open

4 tasks

romanlutz reviewed Jan 30, 2024

View reviewed changes

hildeweerts reviewed Jan 30, 2024

View reviewed changes

rakesh9177 added 3 commits February 5, 2024 12:19

reformatted and removed unnecessary comments

91b02df

refactor naming conventions

7a0ac4b

optimizer options as arguments instead of dict

c6e52c9

adrinjalali reviewed Feb 6, 2024

View reviewed changes

scikit like implementation

d99af9c

rakesh9177 requested review from adrinjalali and hildeweerts March 10, 2024 22:48

rakesh9177 requested a review from romanlutz March 29, 2024 00:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH Optimized Preprocessing Algorithm nips 2016 #1340

ENH Optimized Preprocessing Algorithm nips 2016 #1340

rakesh9177 commented Jan 29, 2024 •

edited by romanlutz

romanlutz left a comment

romanlutz Jan 30, 2024

hildeweerts Jan 30, 2024 •

edited

romanlutz Jan 30, 2024

hildeweerts Jan 30, 2024

rakesh9177 Feb 5, 2024

adrinjalali Feb 6, 2024

rakesh9177 Mar 10, 2024

romanlutz Jan 30, 2024

romanlutz Jan 30, 2024

hildeweerts left a comment

hildeweerts Jan 30, 2024

hildeweerts Jan 30, 2024

rakesh9177 Feb 5, 2024

adrinjalali left a comment

adrinjalali Feb 6, 2024

adrinjalali Feb 6, 2024

rakesh9177 Mar 10, 2024

adrinjalali Feb 6, 2024

adrinjalali Feb 6, 2024

		from sklearn.base import BaseEstimator, TransformerMixin


		class OptimizedPreprocessor:

ENH Optimized Preprocessing Algorithm nips 2016 #1340

Are you sure you want to change the base?

ENH Optimized Preprocessing Algorithm nips 2016 #1340

Conversation

rakesh9177 commented Jan 29, 2024 • edited by romanlutz

Description

Tests

Documentation

Screenshots

romanlutz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hildeweerts Jan 30, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hildeweerts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adrinjalali left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rakesh9177 commented Jan 29, 2024 •

edited by romanlutz

hildeweerts Jan 30, 2024 •

edited