GitHub - microsoft/random_quantize: a novel data augmentation method across data modalities

Introduction

This is a PyTorch implementation of ICCV 2023 paper Randomized Quantization for Data Agnostic Representation Learning. This paper introduces a self-supervised augmentation tool for data agnostic representation learning, by quantizing each input channel through a non-uniform quantizer, with the quantized value sampled randomly within randomly generated quantization bins. Applying the randomized quantization in conjunction with sequential augmentations on self-supervised contrastive models achieves on par results with modality-specific augmentation on vision tasks, and state-of-the-art results on 3D point clouds as well as on audio. We also demonstrate this method to be applicable for augmenting intermediate embeddings in a deep neural network on the comprehensive DABS benchmark which is comprised of various data modalities.

Pretrained checkpoints on ImageNet under moco-v3

Augmentations	Pre-trained checkpoints	Linear probe
Randomized Quantization (100 epochs)	model	42.9
RRC + Randomized Quantization (100 epochs)	model	67.9
RRC + Randomized Quantization (300 epochs)	model	71.6
RRC + Randomized Quantization (800 epochs)	model	72.1

Pretrained checkpoints on Audioset under byol-a

We largely follow the experimental settings of BYOL-A and treat it as our baseline. We replace the Mixup augmentation used in BYOL-A with our randomized quantization. The network is trained on Audioset for 100 epoches. On six downstream audio classification datasets, including NSynth (NS), UrbanSound8K (US8K), VoxCeleb1 (VC1), VoxForge (VF), Speech Commands V2 (SPCV2/12), Speech Commands V2 (SPCV2) , linear probing results are reported as below:

Method	Augmentations	NS	US8K	VC1	VF	SPCV2/12	SPCV2	Average
BYOL-A	RRC + Mixup	74.1	79.1	40.1	90.2	91.0	92.2	77.8
Our model	RRC + Randomized Quantization	74.2	78.0	45.7	92.6	95.1	92.1	79.6

Usage

The code has been tested with PyTorch 1.10.0, CUDA 11.3 and CuDNN 8.2.0. You are recommended to work with this docker image. Bellow are use cases based on moco-v3 with minimal effort that allow people having an interest to immediately inject our augmentation into their own project.

Call the augmentation as one of torchvision.transforms modules.

region_num = 8
#https://github.com/facebookresearch/moco-v3/blob/c349e6e24f40d3fedb22d973f92defa4cedf37a7/main_moco.py#L262-L285
augmentation1 = [
    transforms.RandomResizedCrop(224, scale=(args.crop_min, 1.)),
    RandomizedQuantizationAugModule(region_num, transforms_like=True),
    transforms.ToTensor()
]
augmentation2 = [
    transforms.RandomResizedCrop(224, scale=(args.crop_min, 1.)),
    RandomizedQuantizationAugModule(region_num, transforms_like=True),
    transforms.ToTensor()
]

Apply randomly our augmentation with a given probability.

region_num = 8
p_random_apply1, p_random_apply2 = 0.5, 0.5
#https://github.com/facebookresearch/moco-v3/blob/c349e6e24f40d3fedb22d973f92defa4cedf37a7/main_moco.py#L262
augmentation1 = [
    transforms.RandomResizedCrop(224, scale=(args.crop_min, 1.)),
    RandomizedQuantizationAugModule(region_num, p_random_apply_rand_quant=p_random_apply1),
    transforms.ToTensor()
]
augmentation2 = [
    transforms.RandomResizedCrop(224, scale=(args.crop_min, 1.)),
    RandomizedQuantizationAugModule(region_num, p_random_apply_rand_quant=p_random_apply2),
    transforms.ToTensor()
]

Call the augmentation in forward(). This is faster than above two usages since the augmentation is deployed on GPUs.

# https://github.com/facebookresearch/moco-v3/blob/c349e6e24f40d3fedb22d973f92defa4cedf37a7/moco/builder.py#L35
region_num = 8
self.rand_quant_layer = RandomizedQuantizationAugModule(region_num)
# https://github.com/facebookresearch/moco-v3/blob/c349e6e24f40d3fedb22d973f92defa4cedf37a7/moco/builder.py#L86-L94
q1 = self.predictor(self.base_encoder(self.rand_quant_layer(x1)))
q2 = self.predictor(self.base_encoder(self.rand_quant_layer(x2)))

with torch.no_grad():  # no gradient
    self._update_momentum_encoder(m)  # update the momentum encoder

    # compute momentum features as targets
    k1 = self.momentum_encoder(self.rand_quant_layer(x1))
    k2 = self.momentum_encoder(self.rand_quant_layer(x2))

Citation

@inproceedings{wu2023randomized,
  title={Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning},
  author={Wu, Huimin and Lei, Chenyang and Sun, Xiao and Wang, Peng-Shuai and Chen, Qifeng and Cheng, Kwang-Ting and Lin, Stephen and Wu, Zhirong},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={16305--16316},
  year={2023}
}

@Article{wu2023randomized,
  author={Huimin Wu and Chenyang Lei and Xiao Sun and Peng-Shuai Wang and Qifeng Chen and Kwang-Ting Cheng and Stephen Lin and Zhirong Wu},
  journal = {arXiv:2212.08663},
  title={Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning}, 
  year={2023},
}

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
README_prev.md		README_prev.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
randomized_quantization.py		randomized_quantization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CODE_OF_CONDUCT.md

CODE_OF_CONDUCT.md

LICENSE

LICENSE

README.md

README.md

README_prev.md

README_prev.md

SECURITY.md

SECURITY.md

SUPPORT.md

SUPPORT.md

randomized_quantization.py

randomized_quantization.py

Repository files navigation

Introduction

Pretrained checkpoints on ImageNet under moco-v3

Pretrained checkpoints on Audioset under byol-a

Usage

Citation

Contributing

Trademarks

About

Releases

Packages

Contributors 2

Languages

License

microsoft/random_quantize

Folders and files

Latest commit

History

Repository files navigation

Introduction

Pretrained checkpoints on ImageNet under moco-v3

Pretrained checkpoints on Audioset under byol-a

Usage

Citation

Contributing

Trademarks

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages