Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which sampling method is best for very unbalanced data? #162

Open
viandres opened this issue Dec 16, 2022 · 1 comment
Open

Which sampling method is best for very unbalanced data? #162

viandres opened this issue Dec 16, 2022 · 1 comment

Comments

@viandres
Copy link

Hi!

I am wondering, which of the implemented sampling strategies handles unbalanced data best?
I believe if I get the top 10000 uncertain data instances, but 99 % are in the same class, this would not help much for the next training process iteration, right?

Thank you in advance!

@TomKingsfordUoA
Copy link

For unbalanced data, where the estimator hasn't been trained on minority classes, typically the uncertainty measure fails to give epistemic uncertainty so won't (necessarily) sample the minority classes. Unlike uncertainty-based active learning, diversity-based AL handles this well. I've produced some diversity-based implementations privately and will look to submit a PR in the near future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants