Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH added agg argument to equalized odds difference and ratio to support "average odds" #960

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

IanEisenberg
Copy link

Small addition to include average odds difference as a metric

@romanlutz
Copy link
Member

Hi @IanEisenberg! Do you have a particular application scenario where this metric may be useful? I'm just curious where this may be helpful (vs. for example, equalized odds).

@IanEisenberg
Copy link
Author

I don't feel like it is particularly more revealing than equalized odds, but it is a formulation I see discussed (potentially because of its inclusion in AIF360). The use-case is when you don't care about false positive rate or true positive rate more and care just that you are, across both, reaching some level of performance.

@hildeweerts
Copy link
Contributor

I would be ok to include this. The only thing I don't like about it is that e.g. a very low FPR could 'make up' for a very high FNR, which may be problematic in practice. (In all honesty I would generally advice people to compute the FPR and FNR separately in two metrics because it makes any potential trade-offs much more explicit)

@fairlearn/fairlearn-maintainers WDYT?

@romanlutz
Copy link
Member

Good points! I'm not opposed to it but we really should put some effort into discussing pros and cons of these metrics. We should open an issue for that since it goes beyond just this metric I guess.

fairlearn/metrics/_disparities.py Outdated Show resolved Hide resolved
fairlearn/metrics/_disparities.py Outdated Show resolved Hide resolved
@IanEisenberg
Copy link
Author

I would be ok to include this. The only thing I don't like about it is that e.g. a very low FPR could 'make up' for a very high FNR, which may be problematic in practice. (In all honesty I would generally advice people to compute the FPR and FNR separately in two metrics because it makes any potential trade-offs much more explicit)

@fairlearn/fairlearn-maintainers WDYT?

Completely agree. I like equalized_odds because, while it still obscures FPR and FNR, you don't end up in this "trade-off" space. But average odds may be a better "total" measure if you explicitly look at FPR and FNR at the same time.

@@ -7,6 +7,84 @@
from ._metric_frame import MetricFrame


def average_odds_difference(
Copy link
Member

@MiroDudik MiroDudik Oct 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine including the two metrics--the goal and even behavior is very similar to our existing equalized_odds_difference and equalized_odds_ratio, in the sense that average_odds_difference==0 or average_odds_ratio==1 indicate that there is no disparity according to the equalized odds criterion.

But I think that the name is confusing for two reasons:

  • the word "odds" is in statistics defined as a ratio of "probability of event / 1-probability of event" and we are definitely not averaging such odds...
  • it turns out that AIF360's average_odds_difference is calculating something different than this function! and it's not just the usual distinction due to their use of privileged/unprivileged, it's also the fact that AIF360's definition allows the signs of the two differences to "cancel out" whereas we take the absolute value of the differences before averaging

So my suggestion would be not to create a new function, but instead add a new optional argument to our existing equalized_odds_difference and equalized_odds_ratio. For example, we could add an optional parameter called aggregation_method with its default value being worst_case, and a new possible value being average. The aggregation_method would just specify what's happening with the FPR_difference and FNR_difference -- whether we are taking the worst case or average.

Thoughts?

[tagging @romanlutz @hildeweerts]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too bothered by the name wrt normal usage of the word "odds" - I think we can blame the person who came up with equalized odds for that. I am very surprised that AIF360 would allow for "cancelling out" the differences... isn't that the whole point of considering the FPR and TPR separately instead of looking at overall accuracy?

I like the proposed solution @MiroDudik - in particular the usage of worst_case would result in less mistakes compared to min and max, I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if folks are in favor of just adding an optional argument to equalized_odds_{difference,ratio}, I'd be open to having something less verbose, e.g., we could follow pandas conventions and use agg={worst_case,mean}.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the number of thumbs up I think we can assume that we can move forward with @MiroDudik 's suggestion?

@IanEisenberg would you like to implement the suggestion? If you don't have the time/interest that is of course perfectly fine (in that case we will just open a new issue).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to implement Miro's suggestion. I'll get to it ASAP!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, @IanEisenberg! Feel free to ask questions if anything is unclear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since @IanEisenberg couldn't continue on this I've picked it up today and continued along the lines discussed in this thread. Let me know if you have any concerns! @fairlearn/fairlearn-maintainers

@romanlutz
Copy link
Member

@romanlutz
Copy link
Member

@IanEisenberg are you still planning to get back to this? Please feel free to say so. If not, someone else may be interested in continuing the pull request. I'm going to set 12/20 (two weeks from now) as the cut-off date in case we don't hear back, although it may take much longer to find someone else, of course. Again, we're more than happy to have you continue at a later point if you're still interested!

@romanlutz romanlutz changed the title added average odds difference ENH added average odds difference Dec 7, 2021
@romanlutz romanlutz added the enhancement New feature or request label Dec 7, 2021
@IanEisenberg
Copy link
Author

@IanEisenberg are you still planning to get back to this? Please feel free to say so. If not, someone else may be interested in continuing the pull request. I'm going to set 12/20 (two weeks from now) as the cut-off date in case we don't hear back, although it may take much longer to find someone else, of course. Again, we're more than happy to have you continue at a later point if you're still interested!

Hey Roman,

Don't have the time to push this forward now (many months later). If someone else wants to pick it up, they should. Otherwise we should close.

@romanlutz romanlutz changed the title ENH added average odds difference ENH added agg argument to equalized odds difference and ratio to support "average odds" Oct 17, 2023
Copy link
Contributor

@hildeweerts hildeweerts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment, otherwise I think this is ready to be merged!

fairlearn/metrics/_fairness_metrics.py Show resolved Hide resolved
assert actual == diffs.max()
if agg == "worst_case":
assert actual == diffs.max()
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: for future maintainability it might be nicer to replace else by an explicit if agg == "mean".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm testing the options above, so if it's anything else we can't get here. Are you suggesting I don't check at the beginning but only here at the end? I guess I preferred the early return/raise pattern over doing all the calculations just to let users know that we won't in fact allow other inputs for agg.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I always like it when tests are very explicit so i don't have to scroll up to see if it makes sense, and it's a bit less error prone in case we ever decide to add another agg option. But it's truly a nitpick :)

Edit: i see now that i accidentally requested changes instead of approve, which makes it seem like i feel much more strongly about this than i do lol.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to wait for @MiroDudik's thoughts as well since you both had thoughts on the initial PR and perhaps he feels strongly either way 😎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants