DebiasPL: Debiased Pseudo-Labeling

This repository contains the code (in PyTorch) for the model introduced in the following paper:

Debiased Learning from Naturally Imbalanced Pseudo-Labels
Xudong Wang, Zhirong Wu, Long Lian, and Stella X. Yu
UC Berkeley and Microsoft Research
CVPR 2022

Project Page | Paper | Preprint | Citation

Citation

If you find our work inspiring or use our codebase in your research, please consider giving a star ⭐ and a citation.

@inproceedings{wang2022debiased,
  title={Debiased Learning from Naturally Imbalanced Pseudo-Labels},
  author={Wang, Xudong and Wu, Zhirong and Lian, Long and Yu, Stella X},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={14647--14657},
  year={2022}
}

Updates

[06/2022] Support DebiasPL w/ CLIP for more label-efficient learning. DebiasPL (ResNet50) achieves 69.6% (71.3%) top-1 accuray on ImageNet only using 0.2% (1%) labels!

[04/2022] Initial Commit. Support zero-shot learning and semi-supervised learning on ImageNet.

Requirements

Packages

Python >= 3.7, < 3.9
PyTorch >= 1.6
torchaudio==0.7.2
tensorboard >= 1.14 (for visualization)
tqdm
faiss-gpu
pandas
apex (optional, unless using mixed precision training)

Hardware requirements

8 GPUs with >= 11G GPU RAM or 4 GPUs with >= 16G GPU RAM are recommended.

Dataset and Pre-trained Model Preparation

Please download pre-trained MoCo-EMAN model, make a new folder called pretrained and place checkpoints under it. Please download the ImageNet dataset from this link. Then, move and extract the training and validation images to labeled subfolders, using the following shell script. The indexes for semi-supervised learning experiments can be found at here. The setting with 1% labeled data is the same as FixMatch. A new list of indexes is made for the setting with 0.2% labeled data by randomly selecting 0.2% of instances from each class. Please put all CSV files in the same location as below:

dataset
└── imagenet
    ├── indexes
    │   ├── train_1p_index.csv
    │   ├── train_99p_index.csv
    |   └── ....
    ├── train
    │   ├── n01440764
    │   │   └── *.jpeg
    |   └── ....
    └── val
        ├── n01440764
        │   └── *.jpeg
        └── ....

Training and Evaluation Instructions

Semi-supervised learning on ImageNet-1k

0.2% labeled data (50 epochs):

bash scripts/0.2perc-ssl/train_DebiasPL.sh

1% labeled data (50 epochs):

bash scripts/1perc-ssl/train_DebiasPL.sh

1% labeled data (DebiasPL w/ CLIP, 100 epochs):

bash scripts/1perc-ssl/train_DebiasPL_w_CLIP.sh

Method	Backbone	epochs	0.2% labels	1% labels
FixMatch w/ EMAN	RN50	50	43.6%	60.9%
DebiasPL (reported)	RN50	50	51.6%	65.3%
DebiasPL (reproduced)	RN50	50	52.0% [ckpt \| log]	65.6% [ckpt \| log]
DebiasPL w/ CLIP (reproduced)	RN50	50	69.6% [ckpt \| log]	-
DebiasPL w/ CLIP (reproduced)	RN50	100	70.4% [ckpt \| log]	71.3% [ckpt \| log]

The results reproduced by this codebase are often slightly higher than what was reported in the paper (52.0 vs 51.6; 65.6 vs. 65.3). We find it beneficial to apply cross-level instance-group discrimination loss CLD to unlabeled instances to leverage their information fully.

Zero-shot learning

Please download zero-shot predictions with a pre-trained CLIP (backbone: RN50) model and put them under imagenet/indexes/. Then run experiments on ImageNet-1k with:

bash scripts/zsl/train_DebiasPL.sh

Method	Backbone	epochs	top-1 acc
CLIP	RN50	-	59.6%
CLIP	ViT-Base/32	-	63.2%
DebiasPL (reported)	RN50	100	68.3%
DebiasPL (reproduced)	RN50	50	68.7% [ckpt \| log]

How to get support from us?

If you have any general questions, feel free to email us at xdwang at eecs.berkeley.edu. If you have code or implementation-related questions, please feel free to send emails to us or open an issue in this codebase (We recommend that you open an issue in this codebase, because your questions may help others).

License

This project is licensed under the MIT License. See LICENSE for more details. The parts described below follow their original license.

Acknowledgements

Part of the code is based on EMAN, FixMatch, CLIP, CLD, and LA.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
backbone		backbone
data		data
models		models
scripts		scripts
utils		utils
.gitignore		.gitignore
DebiasPL.gif		DebiasPL.gif
LICENSE		LICENSE
README.md		README.md
ZSL-DomainShift.png		ZSL-DomainShift.png
engine.py		engine.py
main_DebiasPL.py		main_DebiasPL.py
main_DebiasPL_ZeroShot.py		main_DebiasPL_ZeroShot.py
result.png		result.png

License

frank-xwang/debiased-pseudo-labeling

Folders and files

Latest commit

History

Repository files navigation

DebiasPL: Debiased Pseudo-Labeling

Citation

Updates

Requirements

Packages

Hardware requirements

Dataset and Pre-trained Model Preparation

Training and Evaluation Instructions

Semi-supervised learning on ImageNet-1k

Zero-shot learning

How to get support from us?

License

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Languages