Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fairmodels produce issue for output variable as protected variable. #45

Open
Nehagupta90 opened this issue Jan 24, 2022 · 8 comments
Open
Assignees
Labels
good first issue Good for newcomers minor bug 😞 Something should be fixed but it's not critical.

Comments

@Nehagupta90
Copy link

Good day everyone

I am using the output variable of my data (isKilled= yes/no) as a protected variable and “yes” as a priviliged value. But every time I run the code, I just get a few metrics calculated and get the output as shown at the end.

Metric calculation : 2/12 metrics calculated for all models ( 10 NA created )

I used the following code, so could you please suggest where is the problem :

data = readARFF("apns.arff")
na.omit(data)

index= sample(1:nrow(data), 0.7*nrow(data))
train= data[index,]
test= data[-index,]

task = TaskClassif$new("data", backend = train, target = "isKilled")

learner= lrn("classif.randomForest", predict_type = "prob")

model= learner$train(task )
explainer = explain_mlr3(model,
data = test[,-15],
y = as.numeric(test$isKilled)-1,
label="RF")

prot <- ifelse(test$isKilled == 'no', 1, 0)
privileged <- '1'
privileged %in% as.factor(prot)

fc= fairness_check(explainer,
protected = prot,
privileged = privileged)

plot(fc)
msfc <- metric_scores(fc)
plot(msfc)
msfc$metric_scores_data

Output are as below:

image

image

msfc$metric_scores_data
score subgroup metric model
1 NA 0 TPR RF
2 0.90967742 1 TPR RF
5 0.00000000 0 PPV RF
6 1.00000000 1 PPV RF
11 0.06214689 0 FPR RF
12 NA 1 FPR RF
19 0.06214689 0 STP RF
20 0.90967742 1 STP RF
21 0.93785311 0 ACC RF
22 0.90967742 1 ACC RF

@jakwisn
Copy link
Member

jakwisn commented Jan 24, 2022

Probably division by zero occurs, therefore you get NA. In order to correctly compute metrics, the confusion matrix has to be somewhat populated. Additionally, if there is a metric with a score of zero in the metric scores plot, it won't be shown in the fairness check it produces NAN to not show false information.

@Nehagupta90
Copy link
Author

Nehagupta90 commented Jan 25, 2022 via email

@jakwisn
Copy link
Member

jakwisn commented Jan 25, 2022

To get metrics without NA the predictor must make some mistakes. In case of this one it seems that in some metrics (like PPV) there are no true positives in one subgroup and no false positives in other. The fact that you use y as protected might further deepen the problem. Can you show what you get when you run fc$groups_confusion_matrices?

@Nehagupta90
Copy link
Author

Nehagupta90 commented Jan 25, 2022 via email

@Nehagupta90
Copy link
Author

Nehagupta90 commented Jan 25, 2022 via email

@jakwisn
Copy link
Member

jakwisn commented Jan 25, 2022

Yes, this is surely it, but I am surprised that you did not got values for confusion matrices. This is unexpected behaviour and it might be a bug. Is it possible to link the dataset? Could you also list the packages that you used, and please provide version of fairmodels and DALEX. Thanks for bringing it up

@Nehagupta90
Copy link
Author

Nehagupta90 commented Jan 25, 2022 via email

@jakwisn jakwisn self-assigned this Jan 26, 2022
@jakwisn
Copy link
Member

jakwisn commented Jan 26, 2022

Hi,
can't see the dataset, can you somehow link it? Thanks

@jakwisn jakwisn added good first issue Good for newcomers minor bug 😞 Something should be fixed but it's not critical. labels Jan 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers minor bug 😞 Something should be fixed but it's not critical.
Projects
None yet
Development

No branches or pull requests

2 participants