Which sampling method is best for very unbalanced data? #162

viandres · 2022-12-16T14:50:56Z

Hi!

I am wondering, which of the implemented sampling strategies handles unbalanced data best?
I believe if I get the top 10000 uncertain data instances, but 99 % are in the same class, this would not help much for the next training process iteration, right?

Thank you in advance!

TomKingsfordUoA · 2023-06-28T08:51:42Z

For unbalanced data, where the estimator hasn't been trained on minority classes, typically the uncertainty measure fails to give epistemic uncertainty so won't (necessarily) sample the minority classes. Unlike uncertainty-based active learning, diversity-based AL handles this well. I've produced some diversity-based implementations privately and will look to submit a PR in the near future.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Which sampling method is best for very unbalanced data? #162

Which sampling method is best for very unbalanced data? #162

viandres commented Dec 16, 2022

TomKingsfordUoA commented Jun 28, 2023

Which sampling method is best for very unbalanced data? #162

Which sampling method is best for very unbalanced data? #162

Comments

viandres commented Dec 16, 2022

TomKingsfordUoA commented Jun 28, 2023