Replies: 2 comments 2 replies
-
Learning with imbalance is a well established area of research. Have you taken a look at imbalanced-learn? https://imbalanced-learn.org/ |
Beta Was this translation helpful? Give feedback.
0 replies
-
To add up on @jnothman answer, I would probably favour the following solution if I was to use something from Another thing to try would be to look at a novelty detection algorithm and train the model solely with the samples of the positive class. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have a situation which I feel must be quite common. I am doing binary classification and have about 10,000 examples from the positive class. I can make an essentially unlimited number of examples from the negative class however. What is the best approach and is there an elegant scikit learn solution?
One simple idea would be the following. Take all the examples from the positive class and 10,000 examples from the negative chosen at random. Build a classifier (call it classifier 1) and store it. Now repeat as many times as you like storing classifiers 1,2,3...
When you want to perform prediction you take the median of all the predicted probabilities of the classifiers you have stored.
This is just something I made up and I can't believe it isn't a studied problem. What would an expert do and does scikit learn support it?
Beta Was this translation helpful? Give feedback.
All reactions