Fairmodels produce issue for output variable as protected variable. #45

Nehagupta90 · 2022-01-24T15:01:16Z

Good day everyone

I am using the output variable of my data (isKilled= yes/no) as a protected variable and “yes” as a priviliged value. But every time I run the code, I just get a few metrics calculated and get the output as shown at the end.

Metric calculation : 2/12 metrics calculated for all models ( 10 NA created )

I used the following code, so could you please suggest where is the problem :

data = readARFF("apns.arff")
na.omit(data)

index= sample(1:nrow(data), 0.7*nrow(data))
train= data[index,]
test= data[-index,]

task = TaskClassif$new("data", backend = train, target = "isKilled")

learner= lrn("classif.randomForest", predict_type = "prob")

model= learner$train(task )
explainer = explain_mlr3(model,
data = test[,-15],
y = as.numeric(test$isKilled)-1,
label="RF")

prot <- ifelse(test$isKilled == 'no', 1, 0)
privileged <- '1'
privileged %in% as.factor(prot)

fc= fairness_check(explainer,
protected = prot,
privileged = privileged)

plot(fc)
msfc <- metric_scores(fc)
plot(msfc)
msfc$metric_scores_data

Output are as below:

msfc$metric_scores_data
score subgroup metric model
1 NA 0 TPR RF
2 0.90967742 1 TPR RF
5 0.00000000 0 PPV RF
6 1.00000000 1 PPV RF
11 0.06214689 0 FPR RF
12 NA 1 FPR RF
19 0.06214689 0 STP RF
20 0.90967742 1 STP RF
21 0.93785311 0 ACC RF
22 0.90967742 1 ACC RF

jakwisn · 2022-01-24T23:17:57Z

Probably division by zero occurs, therefore you get NA. In order to correctly compute metrics, the confusion matrix has to be somewhat populated. Additionally, if there is a metric with a score of zero in the metric scores plot, it won't be shown in the fairness check it produces NAN to not show false information.

Nehagupta90 · 2022-01-25T09:04:04Z

Thank you jakub. So what would be the solution? How can I avoid the divide by zero problem? Thanks

…

On Monday, January 24, 2022, Jakub Wiśniewski ***@***.***> wrote: Probably division by zero occurs, therefore you get NA. In order to correctly compute metrics, the confusion matrix has to be somewhat populated. Additionally, if there is a metric with a score of zero in the metric scores plot, it won't be shown in the fairness check it produces NAN to not show false information. — Reply to this email directly, view it on GitHub <#45 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AN2ZZ2OOX2BGB7CAOT4NDZTUXXM3NANCNFSM5MVTR4GQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.Message ID: ***@***.***>

jakwisn · 2022-01-25T09:54:55Z

To get metrics without NA the predictor must make some mistakes. In case of this one it seems that in some metrics (like PPV) there are no true positives in one subgroup and no false positives in other. The fact that you use y as protected might further deepen the problem. Can you show what you get when you run fc$groups_confusion_matrices?

Nehagupta90 · 2022-01-25T14:28:49Z

Thank you Jakub If I run this, I get the following: fc$groups_confusion_matrices $RF $`0` NULL $`1` NULL attr(,"class") [1] "group_matrices"

…

On Tue, Jan 25, 2022 at 10:55 AM Jakub Wiśniewski ***@***.***> wrote: To get metrics without NA the predictor must make some mistakes. In case of this one it seems that in some metrics (like PPV) there are no true positives in one subgroup and no false positives in other. The fact that you use y as protected might further deepen the problem. Can you show what you get when you run fc$groups_confusion_matrices? — Reply to this email directly, view it on GitHub <#45 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AN2ZZ2PFGEL2VV7CDQX6GVDUXZXPZANCNFSM5MVTR4GQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.Message ID: ***@***.***>

Nehagupta90 · 2022-01-25T15:07:44Z

I think the problem occur when we use the output variable as a protected variable. I used another dataset with output variable (Y/N values) and it gives the same problem. msfc$metric_scores_data score subgroup metric model 1 NA 0 TPR RF 2 0.8421053 1 TPR RF 5 0.0000000 0 PPV RF 6 1.0000000 1 PPV RF 11 0.1055901 0 FPR RF 12 NA 1 FPR RF 19 0.1055901 0 STP RF 20 0.8421053 1 STP RF 21 0.8944099 0 ACC RF 22 0.8421053 1 ACC RF

…

On Tue, Jan 25, 2022 at 3:28 PM Neha gupta ***@***.***> wrote: Thank you Jakub If I run this, I get the following: fc$groups_confusion_matrices $RF $`0` NULL $`1` NULL attr(,"class") [1] "group_matrices" On Tue, Jan 25, 2022 at 10:55 AM Jakub Wiśniewski < ***@***.***> wrote: > To get metrics without NA the predictor must make some mistakes. In case > of this one it seems that in some metrics (like PPV) there are no true > positives in one subgroup and no false positives in other. The fact that > you use y as protected might further deepen the problem. Can you show what > you get when you run fc$groups_confusion_matrices? > > — > Reply to this email directly, view it on GitHub > <#45 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AN2ZZ2PFGEL2VV7CDQX6GVDUXZXPZANCNFSM5MVTR4GQ> > . > Triage notifications on the go with GitHub Mobile for iOS > <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> > or Android > <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. > > You are receiving this because you authored the thread.Message ID: > ***@***.***> >

jakwisn · 2022-01-25T21:06:27Z

Yes, this is surely it, but I am surprised that you did not got values for confusion matrices. This is unexpected behaviour and it might be a bug. Is it possible to link the dataset? Could you also list the packages that you used, and please provide version of fairmodels and DALEX. Thanks for bringing it up

Nehagupta90 · 2022-01-25T21:14:30Z

Thanks Jakub again. I attached the dataset. I am using the following libraries: library(farff) library(DALEX) version ‘2.2.1’ library(DALEXtra) library(fairmodels) version ‘1.1.0’ library(mlr3) version ‘0.11.0’

…

On Tue, Jan 25, 2022 at 10:06 PM Jakub Wiśniewski ***@***.***> wrote: Yes, this is surely it, but I am surprised that you did not got values for confusion matrices. This is unexpected behaviour and it might be a bug. Is it possible to link the dataset, and packages that you used, and please provide version of fairmodels and DALEX. Thanks — Reply to this email directly, view it on GitHub <#45 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AN2ZZ2PFGHOTWIHJ6C2QOKTUX4GGDANCNFSM5MVTR4GQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.Message ID: ***@***.***>

jakwisn · 2022-01-26T22:53:19Z

Hi,
can't see the dataset, can you somehow link it? Thanks

jakwisn self-assigned this Jan 26, 2022

jakwisn added good first issue Good for newcomers minor bug 😞 Something should be fixed but it's not critical. labels Jan 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fairmodels produce issue for output variable as protected variable. #45

Fairmodels produce issue for output variable as protected variable. #45

Nehagupta90 commented Jan 24, 2022

jakwisn commented Jan 24, 2022

Nehagupta90 commented Jan 25, 2022 via email

jakwisn commented Jan 25, 2022

Nehagupta90 commented Jan 25, 2022 via email

Nehagupta90 commented Jan 25, 2022 via email

jakwisn commented Jan 25, 2022 •

edited

Nehagupta90 commented Jan 25, 2022 via email

jakwisn commented Jan 26, 2022

Fairmodels produce issue for output variable as protected variable. #45

Fairmodels produce issue for output variable as protected variable. #45

Comments

Nehagupta90 commented Jan 24, 2022

jakwisn commented Jan 24, 2022

Nehagupta90 commented Jan 25, 2022 via email

jakwisn commented Jan 25, 2022

Nehagupta90 commented Jan 25, 2022 via email

Nehagupta90 commented Jan 25, 2022 via email

jakwisn commented Jan 25, 2022 • edited

Nehagupta90 commented Jan 25, 2022 via email

jakwisn commented Jan 26, 2022

jakwisn commented Jan 25, 2022 •

edited