HardBalance: Balance strategy based on hard sample mining using semantic similarity #1443
rohitgarud
started this conversation in
Show and tell
Replies: 2 comments 6 replies
-
Awesome idea! I'm wondering what @qubixes thinks about it. He made these balancers. |
Beta Was this translation helpful? Give feedback.
5 replies
-
This is an excellent idea! A simulation study comparing the different balancers for the synergy data might be a nice paper to write! |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Here I am presenting HardBalance balancing strategy. As there is an imbalance between the relevant and irrelevant classes, we use oversampling or undersampling to balance the classes. HardBalance is an undersampling strategy where the irrelevant class is undersampled to match the size of the relevant class. The undersampling is performed in such a way that for each relevant record, we find the irrelevant record which is most semantically similar. The name HardBalance comes from the fact that although it is hard for the classifier to classify these 'hard' irrelevant records due to their similarity to relevant records, it can learn more nuanced differences between the relevant and irrelevant classes during training. This concept is called hard mining.
This is just an idea and has not yet been tested. Hoping to get some comments from the ASReview community.
Beta Was this translation helpful? Give feedback.
All reactions