For some objectives where baseline was 0, "pct better than baseline" is nan #1449

rpeck · 2020-11-20T04:25:21Z

{'F1': nan,
 'MCC Binary': nan,
 'Log Loss Binary': 93.29789549298991,
 'AUC': 58.36492736629537,
 'Precision': nan,
 'Balanced Accuracy Binary': 63.46659876071641,
 'Accuracy Binary': 12.876088314169193}

I've created a Jupyter notebook that reproduces this problem in evalml, and attached it and the associated datafile to a thread in Slack.

The text was updated successfully, but these errors were encountered:

dsherry · 2020-11-20T14:25:47Z

Reproducer

import evalml
import pandas as pd
X = pd.read_csv('~/Downloads/fraud_500_data.csv').drop(['id', 'expiration_date'], axis=1)
y = X.pop('fraud')
automl = evalml.automl.AutoMLSearch(problem_type="binary", objective="f1")
automl.search(X, y)
# note that all percent_better_than_baseline values are nan in the rankings table
print(automl.rankings)
# can also check the scores of any pipeline other than the baseline pipeline, which should have id 0
print(automl.results['pipeline_results'][1]['percent_better_than_baseline_all_objectives'])

Dataset is here

freddyaboulton · 2020-11-20T15:24:03Z

@dsherry @rpeck This is expected behavior because the baseline pipeline gets a score of 0 on the objectives with NaN (F1, MCCBinary, Precision). There have been discussions about setting division by 0 to be either infinity or None in this method but we've never decided those are better than NaN because if the baseline scores the worst possible score on any objective, then comparing "percent better" on that objective doesn't do much good and that can be conveyed with None, NaN, or infinity.

That being said, there may be other reasons to pick one of these options over NaN!

rpeck · 2020-11-20T16:20:57Z

@freddyaboulton Ah, makes sense! I'll change the test to skip over any objective where the baseline is 0. Thanks!

dsherry · 2020-11-20T16:26:54Z

Thank you @freddyaboulton ! @rpeck sorry I didn't catch this when you were asking me about it yesterday.

Leaving this issue open to discuss: should we change the behavior in this case?

@freddyaboulton so F1, MCCBinary and Precision are all metrics where greater is better and are bounded in the range [-1, 1] (corr) or [0, 1]. Could we alter the pct improvement impl to compute the absolute difference from 0 and use that as the pct improvement? And if that's what we're doing currently, I wouldn't expect a baseline of 0 to produce nan pct improvement for those metrics.

freddyaboulton · 2020-11-20T16:44:55Z

@dsherry We proposed computing absolute difference for objectives bounded by [0, 1] in the design phase but we decided having two different computations would be confusing. That being said, we should maybe reconsider that given that the baseline pipeline is almost designed to score 0 on those objectives lol. Worth noting that when we first made that decision, we were only computing the percent better for the primary objective (which is not one of these bounded objectives except for regression).

Even if we go compute absolute difference, we may want to consider changing the Nan/None/inf division-by-0 behavior. One interesting case to consider is R2,since in most cases it's [0, 1] but it's technically (-inf, 1]. So computing absolute difference may not be mathematically sound but since it's the default objective for regression, we should expect to see lots of baselines scoring 0.

freddyaboulton · 2021-02-01T17:36:37Z

So to summarize, there are two independent changes we can make, leading to four possible outcomes:

Do not compute absolute difference for objectives bounded in [0, 1], division by 0 is Nan. Current behavior.
Do not compute absolute difference for objectives bounded in [0, 1], division by 0 is inf.
Compute absolute differences for objectives bounded in [0, 1], division by 0 is Nan.
Compute absolute differences for objectives bounded in [0, 1], division by 0 is inf.

Although I prefer returning NaN when we divide by 0, the gut reaction of users when they see NaN has been to suppose something broke in automl. I think returning inf would make it clearer that nothing broke and that the pipeline is in fact better than the baseline.

That leaves options 2 and 4.

I think having two different computations for "percent better" will make it harder to communicate to users what's actually being computed for each pipeline. That being said, our baseline pipelines are designed to score 0 for a lot of objectives (R2, F1, MCC) especially in imbalanced problems (we just predict the mode). That makes the "percent better" feature not very useful for most realistic problems since all pipelines will be "infinitely" better than the baseline.

I think I'm leaning 55% for option 4 and 45% for option 2 but I'd like to hear other viewpoints before making that change!

dsherry · 2021-02-04T19:00:41Z

In standup today we decided its time to update the "pct better than baseline" behavior. We're going with options 2 and 4 above:

Use relative difference for objectives without bounds (MSE, log loss, etc)
Use absolute difference for objectives with [0, 1] bounds (AUC, R2, etc)
We'll have to handle edge cases like Pearson correlation ([-1, 1])
Return inf rather than nan if there's a divide-by-0 error

@freddyaboulton does this match what we discussed?

rpeck · 2021-02-08T22:36:40Z

Like. :-)

rpeck · 2021-02-08T22:39:46Z

Further: I agree with the decision. IMO, if a metric is [usually, at least] 0..1, then going from 0 to 0.2 feels like a 20% improvement, even though mathematically it isn't. In a way, this reminds me of all of those formulas that take the log of a quantity, but they add 1 first so that they don't take the log of 0. 🙂

rpeck added the blocker An issue blocking a release. label Nov 20, 2020

rpeck pinned this issue Nov 20, 2020

rpeck unpinned this issue Nov 20, 2020

dsherry added bug Issues tracking problems with existing features. and removed blocker An issue blocking a release. labels Nov 20, 2020

dsherry changed the title ~~we're sometimes getting nans for metrics in percent_better_than_baseline_all_objectives~~ Nans in percent_better_than_baseline_all_objectives Nov 20, 2020

dsherry added enhancement An improvement to an existing feature. and removed bug Issues tracking problems with existing features. labels Nov 20, 2020

dsherry changed the title ~~Nans in percent_better_than_baseline_all_objectives~~ For some objectives, percent_better_than_baseline_all_objectives is nan if baseline was 0 Nov 20, 2020

dsherry changed the title ~~For some objectives, percent_better_than_baseline_all_objectives is nan if baseline was 0~~ For some objectives where baseline was 0, "pct better than baseline" is nan Nov 20, 2020

dsherry added this to the Sprint 2021 Jan B milestone Jan 21, 2021

freddyaboulton self-assigned this Feb 2, 2021

dsherry modified the milestones: Sprint 2021 Jan B, Sprint 2021 Feb A Feb 9, 2021

freddyaboulton mentioned this issue Feb 9, 2021

Updating percent better than baseline computation #1809

Merged

freddyaboulton closed this as completed in #1809 Feb 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For some objectives where baseline was 0, "pct better than baseline" is nan #1449

For some objectives where baseline was 0, "pct better than baseline" is nan #1449

rpeck commented Nov 20, 2020 •

edited

dsherry commented Nov 20, 2020 •

edited

freddyaboulton commented Nov 20, 2020

rpeck commented Nov 20, 2020 •

edited

dsherry commented Nov 20, 2020

freddyaboulton commented Nov 20, 2020 •

edited

freddyaboulton commented Feb 1, 2021

dsherry commented Feb 4, 2021

rpeck commented Feb 8, 2021

rpeck commented Feb 8, 2021

For some objectives where baseline was 0, "pct better than baseline" is nan #1449

For some objectives where baseline was 0, "pct better than baseline" is nan #1449

Comments

rpeck commented Nov 20, 2020 • edited

dsherry commented Nov 20, 2020 • edited

freddyaboulton commented Nov 20, 2020

rpeck commented Nov 20, 2020 • edited

dsherry commented Nov 20, 2020

freddyaboulton commented Nov 20, 2020 • edited

freddyaboulton commented Feb 1, 2021

dsherry commented Feb 4, 2021

rpeck commented Feb 8, 2021

rpeck commented Feb 8, 2021

rpeck commented Nov 20, 2020 •

edited

dsherry commented Nov 20, 2020 •

edited

rpeck commented Nov 20, 2020 •

edited

freddyaboulton commented Nov 20, 2020 •

edited