Add support for sample weighting (dataset imbalance) #2738
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
Motivation
Sample weighting is a common strategy used to deal with imbalanced datasets. The idea is to weight the contribution to the loss of each item on the dataset based to its frequency.
There are different strategies, such as Inverso of Number of Samples (INS), Inverse of Square Root of Number of Samples (ISNS) or Effective Number of Samples (ENS), to cite a few (see a brief summary here)
Modification
The code already supports providing an item-wise weight to the loss. However, that weight is not exposed at the level of the head, so it can not be injected at runtime.
There are mechanisms to weight different classes, or different losses (multi-loss scenario). But as far as I understand, it is not possible to carry out sample weighting.
BC-breaking (Optional)
Does the modification introduce changes that break the backward-compatibility of the downstream repos?
No, the changes only extend the functionality without affecting any current scenario.
Use cases (Optional)
Support added for sample weighting. In order to do so, the dataset just needs to add a float scalar named "weight" to the annotation info.
The loading pipeline has been uploaded to pick it up, and it will be injected to the loss function.
This should be quite useful when dealing with imbalanced datasets.
Checklist
Issue
open-mmlab/mmdetection#9905