New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH added agg argument to equalized odds difference and ratio to support "average odds" #960
base: main
Are you sure you want to change the base?
ENH added agg argument to equalized odds difference and ratio to support "average odds" #960
Conversation
Hi @IanEisenberg! Do you have a particular application scenario where this metric may be useful? I'm just curious where this may be helpful (vs. for example, equalized odds). |
I don't feel like it is particularly more revealing than equalized odds, but it is a formulation I see discussed (potentially because of its inclusion in AIF360). The use-case is when you don't care about false positive rate or true positive rate more and care just that you are, across both, reaching some level of performance. |
I would be ok to include this. The only thing I don't like about it is that e.g. a very low FPR could 'make up' for a very high FNR, which may be problematic in practice. (In all honesty I would generally advice people to compute the FPR and FNR separately in two metrics because it makes any potential trade-offs much more explicit) @fairlearn/fairlearn-maintainers WDYT? |
Good points! I'm not opposed to it but we really should put some effort into discussing pros and cons of these metrics. We should open an issue for that since it goes beyond just this metric I guess. |
Completely agree. I like |
fairlearn/metrics/_disparities.py
Outdated
@@ -7,6 +7,84 @@ | |||
from ._metric_frame import MetricFrame | |||
|
|||
|
|||
def average_odds_difference( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine including the two metrics--the goal and even behavior is very similar to our existing equalized_odds_difference
and equalized_odds_ratio
, in the sense that average_odds_difference==0
or average_odds_ratio==1
indicate that there is no disparity according to the equalized odds criterion.
But I think that the name is confusing for two reasons:
- the word "odds" is in statistics defined as a ratio of "probability of event / 1-probability of event" and we are definitely not averaging such odds...
- it turns out that AIF360's average_odds_difference is calculating something different than this function! and it's not just the usual distinction due to their use of privileged/unprivileged, it's also the fact that AIF360's definition allows the signs of the two differences to "cancel out" whereas we take the absolute value of the differences before averaging
So my suggestion would be not to create a new function, but instead add a new optional argument to our existing equalized_odds_difference
and equalized_odds_ratio
. For example, we could add an optional parameter called aggregation_method
with its default value being worst_case
, and a new possible value being average
. The aggregation_method
would just specify what's happening with the FPR_difference and FNR_difference -- whether we are taking the worst case or average.
Thoughts?
[tagging @romanlutz @hildeweerts]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too bothered by the name wrt normal usage of the word "odds" - I think we can blame the person who came up with equalized odds for that. I am very surprised that AIF360 would allow for "cancelling out" the differences... isn't that the whole point of considering the FPR and TPR separately instead of looking at overall accuracy?
I like the proposed solution @MiroDudik - in particular the usage of worst_case
would result in less mistakes compared to min
and max
, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if folks are in favor of just adding an optional argument to equalized_odds_{difference,ratio}
, I'd be open to having something less verbose, e.g., we could follow pandas conventions and use agg={worst_case,mean}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the number of thumbs up I think we can assume that we can move forward with @MiroDudik 's suggestion?
@IanEisenberg would you like to implement the suggestion? If you don't have the time/interest that is of course perfectly fine (in that case we will just open a new issue).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to implement Miro's suggestion. I'll get to it ASAP!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, @IanEisenberg! Feel free to ask questions if anything is unclear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since @IanEisenberg couldn't continue on this I've picked it up today and continued along the lines discussed in this thread. Let me know if you have any concerns! @fairlearn/fairlearn-maintainers
To merge this we'll also need a few tests, e.g., building on https://github.com/fairlearn/fairlearn/blob/1247b081b5cbdf9373595b7e4627c142ae30ae44/test/unit/metrics/test_disparities.py |
@IanEisenberg are you still planning to get back to this? Please feel free to say so. If not, someone else may be interested in continuing the pull request. I'm going to set 12/20 (two weeks from now) as the cut-off date in case we don't hear back, although it may take much longer to find someone else, of course. Again, we're more than happy to have you continue at a later point if you're still interested! |
Hey Roman, Don't have the time to push this forward now (many months later). If someone else wants to pick it up, they should. Otherwise we should close. |
…n/average_odds_difference
…n/average_odds_difference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor comment, otherwise I think this is ready to be merged!
assert actual == diffs.max() | ||
if agg == "worst_case": | ||
assert actual == diffs.max() | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: for future maintainability it might be nicer to replace else
by an explicit if agg == "mean"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm testing the options above, so if it's anything else we can't get here. Are you suggesting I don't check at the beginning but only here at the end? I guess I preferred the early return/raise pattern over doing all the calculations just to let users know that we won't in fact allow other inputs for agg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I always like it when tests are very explicit so i don't have to scroll up to see if it makes sense, and it's a bit less error prone in case we ever decide to add another agg
option. But it's truly a nitpick :)
Edit: i see now that i accidentally requested changes instead of approve, which makes it seem like i feel much more strongly about this than i do lol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to wait for @MiroDudik's thoughts as well since you both had thoughts on the initial PR and perhaps he feels strongly either way 😎
Small addition to include average odds difference as a metric