Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plot_roc_curves(scores, y_true, sensitive_features) #758

Open
michaelamoako opened this issue Apr 28, 2021 · 40 comments · May be fixed by #869
Open

plot_roc_curves(scores, y_true, sensitive_features) #758

michaelamoako opened this issue Apr 28, 2021 · 40 comments · May be fixed by #869
Assignees

Comments

@michaelamoako
Copy link
Contributor

Is your feature request related to a problem? Please describe.

In producing Fairness Assessment results, seeing metrics given a fixed threshold is useful. In addition to that, it helps to also see an ROC curve (TPR vs. FPR) across different thresholds.

Describe the solution you'd like

A function like "plot_roc_curves(scores, y_true, sensitive_features)" that takes in sensitive_features (list of sensitive features) as a parameter, and plots roc curves across subgroups

Describe alternatives you've considered, if relevant

sklearn plot_roc_curve, though this does not support plot via sensitive groups

Additional context

Creating a Fairness Assessment using Fairlearn, ROC curve plots are missing

@adrinjalali
Copy link
Member

I agree this is pretty useful, and for whoever would like to take it up, here's what I've used in my scripts. Feel free to take and generalize it.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, auc
from sklearn.metrics import RocCurveDisplay

def plot_auc(df, name, ax=None):
  y_pred = np.asarray(df.y_pred).astype(float)
  y_true = np.asarray(df.FraudLabel == "true").astype(int)

  fpr, tpr, _ = roc_curve(y_true, y_pred, pos_label=1)
  roc_auc = auc(fpr, tpr)
  display = RocCurveDisplay(fpr=fpr, tpr=tpr, roc_auc=roc_auc, estimator_name=name)
  display.plot(ax=ax)
  return display

plt.figure(figsize=(8, 6))
ax = plt.gca()
overall = plot_auc(df, "overall", ax)
women = plot_auc(df_women, "women", ax)
men = plot_auc(df_men, "men", ax)
nans = plot_auc(df_nan, "unspecified", ax)

And the corresponding one for precision recall curves:

import numpy as np
from sklearn.metrics import (precision_recall_curve,
                             PrecisionRecallDisplay,
                             average_precision_score)

def plot_precision_recall(df, name, ax=None):
  y_pred = np.asarray(df.y_pred).astype(float)
  y_true = np.asarray(df.FraudLabel == "true").astype(int)

  precision, recall, _ = precision_recall_curve(y_true, y_pred)
  average_precision = average_precision_score(y_true, y_pred)
  disp = PrecisionRecallDisplay(
    precision=precision, recall=recall,
    average_precision=average_precision,
    estimator_name=name
  )
  disp.plot(ax=ax)
  return display

plt.figure(figsize=(8, 6))
ax = plt.gca()
overall = plot_precision_recall(df, "overall", ax=ax)
women = plot_precision_recall(df_women, "women", ax)
men = plot_precision_recall(df_men, "men", ax)
nans = plot_precision_recall(df_nan, "unspecified", ax)

@hildeweerts
Copy link
Contributor

Yes, we should definitely include this!

Perhaps adding a separate roc_curve_grouped(y_true, y_pred, sensitive_features) that returns the fpr, fnr, and thresholds for each group would be nice as well. That makes it easier to plot specific thresholds for each group on top of the curves, which can be quite insightful imo.

@michaelamoako
Copy link
Contributor Author

Thanks! There is just one piece I am missing

Given I have a dataframe (df) with labeled sensitive features, I want to create dataframes that are a subset of this, using sensitive features. Something like:

create_sensitive_dfs(df, sensitive_features)
Input: dataframe, list of sensitive features
Output: Smaller dataframes (i.e: df_women, df_men, df_nans)

Example: sensitive_features = [age, race]
Output: df_kid_SA, df_adult_MENA, etc..

This would then allow me to use your script above

@michaelamoako
Copy link
Contributor Author

michaelamoako commented Apr 29, 2021

Resolved: Combining all of the above @kstohr here's my workaround solution for cartesian product to create the smaller data frames - though how it would be embedded in the MetricFrame or where this same product happens is something I could not find.

Where sensitive features is a list of the sensitive feature columns (defined elsewhere)

def sensitive_pdfs(pdf, sensitive_features): 
    distinct_features_vals = []
    df_dict = {}
    for feature in sensitive_features:
        distinct_vals = set(df[feature].to_list())
        distinct_features_vals.append(distinct_vals)
    distinct_features_combos = list(product(*distinct_features_vals))
    for features in distinct_features_combos:
        query = " & ".join([f"{sensitive_features[i]} == '{features[i]}'" for i in range(len(sensitive_features))])
        filtered_df = df.query(query)
        key = tuple(sorted(features))
        df_dict[key] = filtered_df 
    return df_dict


def plot_auc(df, y_pred, y_true, name, ax=None):
  y_pred_r = np.asarray(df[y_pred]).astype(float)
  y_true_r = np.asarray(df[y_true]).astype(int)

  fpr, tpr, _ = roc_curve(y_true_r, y_pred_r, pos_label=1)
  roc_auc = auc(fpr, tpr)
  display = RocCurveDisplay(fpr=fpr, tpr=tpr, roc_auc=roc_auc, estimator_name=name)
  display.plot(ax=ax)
  return display


sensitive_pdfs_dict=sensitive_pdfs(pdf, sensitive_features)
plt.figure(figsize=(8, 6))
ax = plt.gca()
for grouping in sensitive_pdfs_dict: 
    group_plot= plot_auc(sensitive_pdfs_dict[grouping],"matching_score","golden_label", grouping, ax)

@lurosenb
Copy link

Hi all!

Here is some code that splits pdfs based on a list of columns (the columns must only have boolean values)

Example use:
sensitive_pdfs(df, [('color', ['blue','red']), ('city', ['ny','nj'])], 'pdf')

Example in colab notebook:
Notebook

def sensitive_pdfs(pdf, sensitive_features, title): 
  # Feature we are splitting on
  sensitive_feature, values = sensitive_features[0]

  # Safety check: guarantee that feature is boolean
  if pdf[sensitive_feature].isin(values).all():
    # Remove feature we are about to split by
    sensitive_features.pop(0)
    
    # For title tracking
    v0 = values[0]
    v1 = values[1]

    # Apply condition to split pdf 
    tmp1 = pdf[pdf[sensitive_feature] == values[0]]
    tmp2 = pdf[pdf[sensitive_feature] == values[1]]

    # Update titles for end pdfs
    title1 =  title + '_' + str(v0) + '_'
    title2 =  title + '_' + str(v1) + '_'

    # Base case check - no more features to split by
    if len(sensitive_features) == 0:
      return [(title + str(v0), tmp1)] + [(title + str(v1), tmp2)]
    else:
      # Recurse, continuing to split by sensitive features
      return sensitive_pdfs(tmp1, sensitive_features.copy(), title + '_' + str(v0) + '_') + \
             sensitive_pdfs(tmp2, sensitive_features.copy(), title + '_' + str(v1) + '_')
  else:
    raise ValueError('Oops - never should happen')

@adrinjalali
Copy link
Member

Reopening since we need a nice method in fairlearn to do this.

@adrinjalali adrinjalali reopened this Apr 30, 2021
@MiroDudik
Copy link
Member

I agree that this is a super useful addition. I also like @hildeweerts suggestion to consider extending both sklearn's roc_curve and plot_roc_curve.

Until we find somebody willing to implement this, can we tease out the API we would like to see? We would use the same pattern for other curves that we discussed and come up in fairness literature like calibration_curve and cdf_curve.

For plotting, my preference is to have API of the following form:

  • plot_roc_curves(*, y_true, y_score, sensitive_features, **kwargs) -> matplotlib.axes.Axes

For just getting fpr, tpr, thresholds (but no plotting), we could have

  • roc_curves(*, y_true, y_score, sensitive_features, **kwargs)
    • I am less sure about the return format here. But one idea would be to return three dictionaries fpr, tpr, thresholds that would be indexed by the sensitive feature values (or a tuple of sensitive feature values if we have multiple sensitive features).
    • Another idea would be to do something similar to threshold optimizer, where we return tpr and threshold (a generalized notion of threshold) as a function of fpr, where the range of fpr values would be the same for all sensitive features. This I think is actually a much more useful output format for any further processing.

@hildeweerts
Copy link
Contributor

I agree, let's get the conversation started (which may be relevant to the other plotting functionality as well? @romanlutz )

  • plot_roc_curve
    • If we want to be consistent with sklearn we should use something like: plot_roc_curve(Estimator, X, y, sensitive_features, *, sample_weight=None, drop_intermediate=True, response_method='auto', name=None, ax=None, pos_label=None, **kwargs) returning a RocCurveDisplay (or similar).
    • I would also be okay with using y_score directly in the API, as it allows for a bit more flexibility wrt the estimator, although that is at the risk of people accidentally using y_pred.
  • roc_curve(y_true, y_score, sensitive_features, *, pos_label=None, sample_weight=None, drop_intermediate=True)
    • I don't see a need for **kwargs here
    • I think returning a dictionary makes sense! Could you elaborate a bit on the other idea? I don't fully understand yet what that would look like.

Is there a specific reason you want to put * at the beginning? I'm always a bit wary about that because it's too easy to mess up the order of y_true and y_score or the difference between y_score and y_pred.

@MiroDudik
Copy link
Member

Re. compatibility with sklearn, I noticed that they use estimator, X, y signature, but it seems very limiting (or forcing some workarounds with pass-through estimators). So, in this case, I was proposing to intentionally deviate from the sklearn pattern (that's why I would also use a different name, i.e., plot_roc_curves). I would make all of the arguments keyword-only to prevent any confusion with the sklearn format.

Re. * at the beginning: I mostly want to ensure keyword-only arguments, because some common libraries have the opposite convention from sklearn around the order of y_true vs y_pred. I wouldn't mind allowing both y_pred and y_score (as interchangeable), so the signature could be something like:

  • roc_curve(*, y_true, y_score=None, y_pred=None, sensitive_features, ... )

@kstohr
Copy link

kstohr commented May 17, 2021

@romanlutz Ok, I'll pick this up. Not sure how long it will take me, but will dive in and work through it.

@romanlutz
Copy link
Member

Sounds great! Let us know if you have questions! We're happy to help! I'll also keep checking Discord during the sprint, of course.

@michaelamoako
Copy link
Contributor Author

Quick note: There are contexts in which no negative samples in the ground truth ("No negative samples in y_true"), in which case an ROC curve can not be created

@kstohr
Copy link

kstohr commented May 17, 2021

@michaelamoako ok, so you want me to detect those cases and handle them with a try/except? Any other cases that would arise that we should handle?

@kstohr
Copy link

kstohr commented May 17, 2021

@romanlutz @michaelamoako New to this repo.. Couple quick q's before I build out this class/function.

  • I see you have a dir metrics and a dir postprocessing which has some plotting functionality in it already. Where do you want this feature to live? metrics or postprocessing ?

  • Also this function already exists: _calculate_tradeoff_points which --at first glance-- does appear to compute ROC points. Do we need to just extend this to process the data grouped by the sensitive features?

  • If we're plotting the ROC curve for each subgroup, that's just a line plot with traces. It sounds like Fairlearn is moving away from having a dashboard and instead enabling plotting as a set of utilities. Is that right? In which case do we want to build a base _line_plot and _grouped_line_plot class in Matplotlib or maybe plot.ly (interactive) which would be re-usable? Then inherit that class into other metrics classes that might benefit from plotting?

  • Related: I see this plotting functionality here: https://github.com/fairlearn/fairlearn/blob/main/fairlearn/postprocessing/_plotting.py ... do we want to add to or extend this module? i.e. build out plotting utilities in the same module?

@romanlutz
Copy link
Member

@kstohr The ROC curves would be more general than the postprocessing functionality in that they could be generated for all models (not just the ones from postprocessing). I would suggest creating a new file for it under the metrics folder.

The postprocessing method has a lot of custom code that we can switch out at a later point. For now, I would completely ignore it since it has somewhat more comprehensive requirements. Same for the plotting capabilities of that module, they're very custom to the mitigation technique used there. Let's ignore it for now.

You're right about the dashboard moving away (in fact #766 deletes it), so this should just be its own standlone utility. We're all in favor of reusing things that already exist. Perhaps some of the code snippets from this thread are useful to have ROC plots with multiple lines (one per group)? I think matplotlib supports this already by just passing in the same axes to subsequent plotting commands.

@kstohr
Copy link

kstohr commented May 18, 2021

@romanlutz Ok great. So, I'll build a module under metrics: metrics/_roc.py If we want to refactor later to make things more re-usable, we can.

Yep. Saw the code snippets and incorporating as needed.

In terms of plotting, you can use plt.subplots() to add traces to the figure, but sklearn already has a class RocCurveDisplay and it looks like you can pass an existing ax to the function which should enable us to add traces to the figure for the each of the sensitive values.

@romanlutz
Copy link
Member

Yes! I was hoping we could reuse what sklearn already has. In many ways, a lot of what Fairlearn does in assessment builds on sklearn by disaggregating metrics so this is yet another way we're doing so. Thanks @kstohr !

@michaelamoako
Copy link
Contributor Author

Another very closely related but also valuable curve: Threshold vs. [Metric]. For example, to plot Threshold vs. [Recall]:

Assume y_true/y_pred defined elsewhere

  _, recall, thresholds = precision_recall_curve(y_true_r, y_pred_r)
  plt.plot(thresholds, recall[:-1], label=name) 
  plt.xlabel("Threshold")
  plt.ylabel("Recall")

the ROC curve function similarly also returns threshold, so there is potential to plot that on the X-axis there too

@kstohr
Copy link

kstohr commented May 18, 2021

@michaelamoako Yep. Will keep precision_recall_curve in mind. Assuming you'd also do that for each sensitive value (i.e. by group.)

@romanlutz
Copy link
Member

@kstohr feel free to stick with one thing per PR. Smaller PRs are easier to review and typically get merged significantly faster 😄 @michaelamoako 's suggestion can be another PR unless it's just a few extra lines. Keep in mind that adding ROC curves should include at least a small unit test and a short user guide section, so this is already quite a bit.

@michaelamoako
Copy link
Contributor Author

Agreed with @romanlutz - and yes (per group)

@kstohr
Copy link

kstohr commented May 18, 2021

@romanlutz @michaelamoako Roger that. I think Michael was just pointing out that the code should ideally be easy to adapt for this other purpose. At least that's how I undrestood it.

riedgar-ms pushed a commit that referenced this issue May 19, 2021
…766)

With a slight delay (originally targeted for April) I'm finally removing the `FairlearnDashboard` since a newer version already exists in `raiwidgets`. The documentation is updated to instead use the plots @MiroDudik created with a single line directly from the `MetricFrame`. In the future we want to add more kinds of plots as already mentioned in #758 #666 and #668 . Specifically, the model comparison plots do not yet have a replacement yet.

Note that the "example" added to the `examples` directory is not shown under "Example notebooks" on the webpage, which is intentional since it's technically not a notebook.

This also makes #561 mostly redundant, which I'll close shortly. 
#667 is also directly addressed with this PR as the examples illustrate.

Signed-off-by: Roman Lutz <rolutz@microsoft.com>
@kstohr
Copy link

kstohr commented May 21, 2021

@michaelamoako @romanlutz Yokee, the ref code snippets are up and running. Quick and dirty plot:

Screen Shot 2021-05-21 at 9 03 35 AM

I am thinking of testing out using the existing MetricFrame to do the data splitting. The benefit of using MetricFrame is that this would be a standard way of handling splitting the data on sensitive values. You get the 'overall' curve as well as the curves for the sensitive values, and any methods we add to the MetricFrame class would be inherited by the ROC Curve class. The downside is that the _roc_utils.py would have a dependency on MetricFrame. Thoughts before I head down this path?

@riedgar-ms
Copy link
Member

I don't think that a dependency on MetricFrame would be a bad thing. Or do you mean to inherit from it?

@kstohr
Copy link

kstohr commented May 21, 2021 via email

@michaelamoako
Copy link
Contributor Author

@kstohr I just tagged you in a comment above related to Cartesian product -though where this process is happening in the MetricFrame is not clear to me

@MiroDudik
Copy link
Member

MiroDudik commented May 21, 2021

@kstohr : my suggestions:

  • you could just start with plot_roc_curves
  • it would be good to figure out the invocation format on this issue (either at the same time as PR or before PR)
  • I wouldn't worry about integrating things internally with MetricFrame yet... I expect that we'll need to re-implement and re-factor MetricFrame because it's 10x slower than an equivalent pandas code (I'll create an issue on that hopefully next week), but if it simplifies your life (rather than makes it more complicated), go for it
  • One idea re. MetricFrame would be to just use the following "metric" (maybe this is what you had in mind?):
def metric(y_true, y_pred):
    return (y_true, y_pred)
mf = MetricFrame(metric, y_true, y_pred, sensitive_features=sensitive_features)
mf.by_group  # this will contain the split of the data y_true and y_pred
             # and consider all combinations if multiple sensitive features are provided

@kstohr
Copy link

kstohr commented May 21, 2021

@MiroDudik Yep that's exactly what I had in mind, but I have to pass a third parameter...looks like that's permitted in the docs and it will split on it and extract the groups. Not sure if it will handle the cartesian split. Still getting up to speed on this codebase.

@MiroDudik
Copy link
Member

Cartesian split is already handled I believe, you just provide two columns of sensitive features and it'll consider all combinations. See here.

MetricFrame also supports additional column parameters, e.g.:

def metric(y_true, y_pred, y_pred_proba):
    return (y_true, y_pred, y_pred_proba)
mf = MetricFrame(metric, y_true, y_pred, sensitive_features=sensitive_features, 
                 sample_params = {'y_pred_proba': y_pred_proba})
mf.by_group  # this will contain the split of the data y_true, y_pred, and y_pred_proba
             # and consider all combinations if multiple sensitive features are provided

@michaelamoako
Copy link
Contributor Author

This would be tied to the NaN issue in MetricFrame right @MiroDudik ?

@MiroDudik
Copy link
Member

@michaelamoako : I think you're thinking about #800? We don't have to worry about that here, because we're not using mf.difference() or other aggregates (so the above code shouldn't raise any errors).

@kstohr
Copy link

kstohr commented May 21, 2021 via email

@kstohr
Copy link

kstohr commented Jun 3, 2021

y'all this is coming along:

I think the best course of action is to build on MetricFrame as it will create a common language across the Fairlearn API. So you can split by group and use the series index to plot a subset of the sensitive feature subgroups (remember is the cartesian product, so selecting a smaller set to plot will be important.)

TODO:

  • Determine how best to instantiate plot figure, axis (right now there's a bit of redundant code; might instantiate with the class to have a default figure, ax)
  • Separate functions to add 'Overall" and "Baseline" traces to enable users to toggle these traces on/off
  • logging, error handling
  • Add additional AUC score and related metric functionality by sensitive feature group... if we're generating the AUC score anyway, why not return it
  • unit tests
    Overall thoughts on this basic approach.
class roc_auc: 
    """
    Provides utilties for generating auc scores, roc curves
    and plotting roc_curves grouped by sensitive features. 
    
    Parameters
    ----------
    y_true : List, pandas.Series, numpy.ndarray, pandas.DataFrame
        The ground-truth labels for classification. (i.e. results of clf.predict())
        
    y_score : List, pandas.Series, numpy.ndarray, pandas.DataFrame
        
    sensitive_features : List, pandas.Series, dict of 1d arrays, numpy.ndarray, pandas.DataFrame
        The sensitive features which should be used to create the subgroups.
        At least one sensitive feature must be provided.
        All names (whether on pandas objects or dictionary keys) must be strings.
        We also forbid DataFrames with column names of ``None``.
        For cases where no names are provided we generate names ``sensitive_feature_[n]``.
    """
    def __init__(self,
                 y_true, 
                 y_score, 
                 sensitive_features): 
        """
        Initiate class with required parameters to generate metric. 
        TODO: validate input
        """
        self.y_true = y_true
        self.y_score = y_score 
        self.sensitive_features = sensitive_features
        self.ns_probs = [0 for n in range(len(self.y_true))]
        
    @staticmethod
    def splitter(y_true, y_pred): 
        """
        Placeholder function to enable splitting of dataframes using 
        existing MetricFrame class. 
        """
        return (y_true, y_pred)
    
    @staticmethod
    def plot_auc(y_true, y_score, name, ax=None, pos_label=1, **kwargs):
        """
        Plot auc curves. 
        """
        
        #Establish plot figure if not already generated
        if not ax: 
            # Establish plot figure
            plt.figure(figsize=(8, 6))
            ax = plt.gca()
        else: 
            self.ax = ax
            
        fpr, tpr, _ = roc_curve(y_true, y_score, pos_label=pos_label, **kwargs)
        roc_auc = auc(fpr, tpr)
        display = RocCurveDisplay(fpr=fpr, tpr=tpr, roc_auc=roc_auc, estimator_name=name)
        display.plot(ax=ax)
        return display
    
    def split_by_group(self): 
        """
        Splits data by sensitive feature subgroups. 
        See: Fairlearn.MetricFrame for more detail. 
        
        Note: MetricFrame requires y_pred (clf.predict). However, ROC curves and AUC scores 
        are generated using y_score (clf.predict_proba). 
        Method substitutes y_score (type:float) for y_pred (type:int) to conform to MetricFrame 
        required params. 
        (MetricFrame support regression and therefore allows values of type float to be 
        passed as y_pred.) 
        Admittedly this is a little weird. Alternately, you could pass `sample_params` to MetricFrame, 
        but not sure that's any cleaner. 
        """
        mf = MetricFrame(
            metric = self.splitter, 
            y_true = self.y_true, 
            y_pred = self.y_score, 
            sensitive_features = self.sensitive_features,
            #sample_params = {'y_score': y_score}
                        )
        self.sensitive_series = mf.by_group
        return self.sensitive_series
    
    def plot_roc_groups(self, 
                        sensitive_index=None, 
                        ax=None): 
        
        
        #Establish plot figure if not already generated
        if not ax: 
            # Establish plot figure
            plt.figure(figsize=(8, 6))
            ax = plt.gca()
        else: 
            self.ax = ax
        
        # Establish which combinations of sensitive features to plot
        if not sensitive_index: 
            sensitive_index = self.sensitive_series.index

        # Plot baseline - 'no skill'
        # i.e. performance of classifier is equivalent to random selection
        ns_auc = roc_auc_score(self.y_true, self.ns_probs)
        ns_fpr, ns_tpr, _ = roc_curve(self.y_true, self.ns_probs)
        ax.plot(ns_fpr, ns_tpr, linestyle='--', label='Baseline (AUC = 0.50)')

        # Plot overall model performance
        overall_auc = roc_auc_score(self.y_true, self.y_score)
        overall_fpr, overall_tpr, _ = roc_curve(self.y_true, self.y_score)
        ax.plot(overall_fpr, overall_tpr, label=f'Overall (AUC = {round(overall_auc, 2)})')
        
        # Plot ROC Curves by group
        for name in sensitive_index: 
            grp = self.sensitive_series[name]
            grp_y_true = self.sensitive_series[name][0]
            grp_y_score = self.sensitive_series[name][1]
            group_plot = plot_auc(
                y_true=grp_y_true, 
                y_score=grp_y_score, 
                name=name, ax=ax)
        return ax

@riedgar-ms
Copy link
Member

Tagging @alexquach who has started work on #235 . Both of these are generically looking at 'plots for MetricFrame' so we should try to have a common API pattern (even if the details will be different)

@kstohr
Copy link

kstohr commented Jun 4, 2021 via email

@riedgar-ms
Copy link
Member

@alexquach only just started this week, so there's still some playing around with 'how to plot this' before we get too far into the API design. I do like your approach of having a separate class, since there's no need to deal with MetricFrame internals. For the error bar case, it's more likely to be 'bring the MetricFrame precomputed and tell us the mapping of metrics to error bars' though.

We can meet to talk, if you think that would be helpful (and we can figure out timezones).

@kstohr
Copy link

kstohr commented Jun 4, 2021 via email

@romanlutz
Copy link
Member

Styling shouldn't be done in this code I think. That's why people can provide their own ax. That's something @adrinjalali pointed out in the past with other plots, too.

Although, now I'm not sure you actually meant "styling" (as in https://matplotlib.org/stable/gallery/style_sheets/style_sheets_reference.html) or just axis labels, legend, etc.. Regardless, it shouldn't be preconfigured and people should be allowed to configure it however they like. I don't think that will affect the implementation in any way, other than the fact that we shouldn't set any options internally (e.g., figure size, shouldn't be set in our code).

@kstohr
Copy link

kstohr commented Jun 4, 2021

@romanlutz Here's how scikit learn handles it.

RocCurveDisplay

What they have done is offered the full convenience of a labeled plot. However, those who want to provide their own fig, ax, or overwrite the defaults can. They use the default figure size, font, etc. Notice how they return the figure and the axis as 'self' which enables the user to configure the plot further if they wish.

I am following the above approach. In fact, in the case of Roc Curves, I am actually using RocCurveDisplay.

Because plotting is specific to the metric you are plotting, I think providing labeled plotting functions that add convenience to the relevant modules makes more sense than building a generic shared plotting utility. Basically adapting common metrics to return data by subgroup and the associated plots to handle plotting sensitive features by subgroup.

@kstohr
Copy link

kstohr commented Jun 18, 2021

Hi all. I pushed a draft PR for this last night. Take a look at the basic approach and give me feedback when you have a moment.

https://github.com/fairlearn/fairlearn/pull/869/files

It's not passin CI/CD checks yet, still needs.

  • more functional testing;
  • unit tests
  • improved example documentation
    etc.

@romanlutz romanlutz linked a pull request Feb 4, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants