Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH WIP Arbitrary target labels instead of 0/1 #1006

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

SeanMcCarren
Copy link
Contributor

@SeanMcCarren SeanMcCarren commented Jan 12, 2022

WIP on issue #911 (don't need to call the police)

This PR involves:

  • threshold optimizer
  • exponentiated gradient. (somewhat finished)
  • grid search

Issues:

  • If all supplied labels are one of the two possible labels, then we can't infer self.classes_. So, now, we should raise an error. But this breaks the testcases test_threshold_optimization_degenerate_labels and test_single_y_class (the latter is of gridsearch).
  • In Exponentiated gradient, how can we know if y is binary or continuous? If it is continuous, we should not apply the encoding. Can we do the same as in gridsearch: isinstance(self.constraints, ClassificationMoment) ?
  • Exponentiated gradient testcase is dependent on 0/1 predictions in Moments, I don't know how to fix this. For now, I'll leave the prediction as 0/1s.

@SeanMcCarren SeanMcCarren changed the title WIP Arbitrary targets instead of 0/1 ENH WIP Arbitrary target labels instead of 0/1 Jan 12, 2022
@romanlutz
Copy link
Member

@SeanMcCarren

In Exponentiated gradient, how can we know if y is binary or continuous? If it is continuous, we should not apply the encoding. Can we do the same as in gridsearch: isinstance(self.constraints, ClassificationMoment) ?
Yes! The Moment classes are for both GridSearch and ExponentiatedGradient.

@romanlutz
Copy link
Member

Exponentiated gradient testcase is dependent on 0/1 predictions in Moments, I don't know how to fix this. For now, I'll leave the prediction as 0/1s.

You can adjust those test cases to run multiple times, of course. Just parametrize it to have a variety of labels, e.g.,

  • 0 and 1
  • -1 and 1
  • 0 and 2
  • (Are text labels allowed? I'm not sure, feel free to try, but it's not mandatory to have that IMO)

@SeanMcCarren
Copy link
Contributor Author

You can adjust those test cases to run multiple times, of course. Just parametrize it to have a variety of labels, e.g.,

Thanks for the quick response! Yes, I did parameterize, but the problem here is that I added preprocessing in Exponentiated Gradient, and not in all the individual load_data functions. In test_argument_types_difference_bound there are some operations on y that assume it is 0/1 and don't work for other values/strings, but this should not be the case I think. Additionally, there are some operations that assume that the prediction outputs are 0/1, which also shouldn't necessarily be true. I guess I could change the testcases, but I'm wondering if instead I should more fundamentally change load_data (but this seems wrong)

@SeanMcCarren
Copy link
Contributor Author

@adrinjalali @romanlutz Is this important to be done quick? If so, you should know I am a bit stuck, see the issues in the top post.

@adrinjalali
Copy link
Member

  • If all supplied labels are one of the two possible labels, then we can't infer self.classes_. So, now, we should raise an error. But this breaks the testcases test_threshold_optimization_degenerate_labels and test_single_y_class (the latter is of gridsearch).

I'd say that's the expected behavior. We should raise if there is only one class.

  • In Exponentiated gradient, how can we know if y is binary or continuous? If it is continuous, we should not apply the encoding. Can we do the same as in gridsearch: isinstance(self.constraints, ClassificationMoment) ?

Generally, is_classifier should work, if it doesn't we should change the implementation for it to work. Meaning you may end up having two different classes for a regressor and a classifier.

Also, this is not in the milestone, so we're not in too much of a rush for this PR.

@SeanMcCarren
Copy link
Contributor Author

SeanMcCarren commented Mar 31, 2022

Sorry i've been quite inactive on this issue, I might pick it up in the future but for the coming couple of months not. If you are interested in this, please feel free to contribute!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants