Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for sample weighting (dataset imbalance) #2738

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

InakiRaba91
Copy link

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Sample weighting is a common strategy used to deal with imbalanced datasets. The idea is to weight the contribution to the loss of each item on the dataset based to its frequency.

There are different strategies, such as Inverso of Number of Samples (INS), Inverse of Square Root of Number of Samples (ISNS) or Effective Number of Samples (ENS), to cite a few (see a brief summary here)

Modification

The code already supports providing an item-wise weight to the loss. However, that weight is not exposed at the level of the head, so it can not be injected at runtime.

There are mechanisms to weight different classes, or different losses (multi-loss scenario). But as far as I understand, it is not possible to carry out sample weighting.

BC-breaking (Optional)

Does the modification introduce changes that break the backward-compatibility of the downstream repos?

No, the changes only extend the functionality without affecting any current scenario.

Use cases (Optional)

Support added for sample weighting. In order to do so, the dataset just needs to add a float scalar named "weight" to the annotation info.

The loading pipeline has been uploaded to pick it up, and it will be injected to the loss function.

This should be quite useful when dealing with imbalanced datasets.

Checklist

  1. Pre-commit or other linting tools are used to fix the potential lint issues.
  2. The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  3. If the modification has potential influence on downstream projects, this PR should be tested with downstream projects, like MMDet or MMDet3D.
  4. The documentation has been modified accordingly, like docstring or example tutorials.

Issue

open-mmlab/mmdetection#9905

@InakiRaba91
Copy link
Author

Please let me know if you can think of any issues with this approach. If you believe it is on the right track, I'll update the forward_train method in point_head.py and add unit tests for the new functionality.

@MeowZheng
Copy link
Collaborator

thanks for your contribution, we are working on reviewing this pr.

aravind-h-v pushed a commit to aravind-h-v/mmsegmentation that referenced this pull request Mar 27, 2023
* [MS Text To Video} Add first text to video

* upload

* make first model example

* match unet3d params

* make sure weights are correcctly converted

* improve

* forward pass works, but diff result

* make forward work

* fix more

* finish

* refactor video output class.

* feat: add support for a video export utility.

* fix: opencv availability check.

* run make fix-copies.

* add: docs for the model components.

* add: standalone pipeline doc.

* edit docstring of the pipeline.

* add: right path to TransformerTempModel

* add: first set of tests.

* complete fast tests for text to video.

* fix bug

* up

* three fast tests failing.

* add: note on slow tests

* make work with all schedulers

* apply styling.

* add slow tests

* change file name

* update

* more correction

* more fixes

* finish

* up

* Apply suggestions from code review

* up

* finish

* make copies

* fix pipeline tests

* fix more tests

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* apply suggestions

* up

* revert

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
@jgoodman8
Copy link

It would be great to add this feature 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants