Skip to content

microsoft/random_quantize

Repository files navigation

Introduction

This is a PyTorch implementation of ICCV 2023 paper Randomized Quantization for Data Agnostic Representation Learning. This paper introduces a self-supervised augmentation tool for data agnostic representation learning, by quantizing each input channel through a non-uniform quantizer, with the quantized value sampled randomly within randomly generated quantization bins. Applying the randomized quantization in conjunction with sequential augmentations on self-supervised contrastive models achieves on par results with modality-specific augmentation on vision tasks, and state-of-the-art results on 3D point clouds as well as on audio. We also demonstrate this method to be applicable for augmenting intermediate embeddings in a deep neural network on the comprehensive DABS benchmark which is comprised of various data modalities.

Pretrained checkpoints on ImageNet under moco-v3

Augmentations Pre-trained checkpoints Linear probe
Randomized Quantization (100 epochs) model 42.9
RRC + Randomized Quantization (100 epochs) model 67.9
RRC + Randomized Quantization (300 epochs) model 71.6
RRC + Randomized Quantization (800 epochs) model 72.1

Pretrained checkpoints on Audioset under byol-a

We largely follow the experimental settings of BYOL-A and treat it as our baseline. We replace the Mixup augmentation used in BYOL-A with our randomized quantization. The network is trained on Audioset for 100 epoches. On six downstream audio classification datasets, including NSynth (NS), UrbanSound8K (US8K), VoxCeleb1 (VC1), VoxForge (VF), Speech Commands V2 (SPCV2/12), Speech Commands V2 (SPCV2) , linear probing results are reported as below:

Method Augmentations NS US8K VC1 VF SPCV2/12 SPCV2 Average
BYOL-A RRC + Mixup 74.1 79.1 40.1 90.2 91.0 92.2 77.8
Our model RRC + Randomized Quantization 74.2 78.0 45.7 92.6 95.1 92.1 79.6

Usage

The code has been tested with PyTorch 1.10.0, CUDA 11.3 and CuDNN 8.2.0. You are recommended to work with this docker image. Bellow are use cases based on moco-v3 with minimal effort that allow people having an interest to immediately inject our augmentation into their own project.

  1. Call the augmentation as one of torchvision.transforms modules.
region_num = 8
#https://github.com/facebookresearch/moco-v3/blob/c349e6e24f40d3fedb22d973f92defa4cedf37a7/main_moco.py#L262-L285
augmentation1 = [
    transforms.RandomResizedCrop(224, scale=(args.crop_min, 1.)),
    RandomizedQuantizationAugModule(region_num, transforms_like=True),
    transforms.ToTensor()
]
augmentation2 = [
    transforms.RandomResizedCrop(224, scale=(args.crop_min, 1.)),
    RandomizedQuantizationAugModule(region_num, transforms_like=True),
    transforms.ToTensor()
]
  1. Apply randomly our augmentation with a given probability.
region_num = 8
p_random_apply1, p_random_apply2 = 0.5, 0.5
#https://github.com/facebookresearch/moco-v3/blob/c349e6e24f40d3fedb22d973f92defa4cedf37a7/main_moco.py#L262
augmentation1 = [
    transforms.RandomResizedCrop(224, scale=(args.crop_min, 1.)),
    RandomizedQuantizationAugModule(region_num, p_random_apply_rand_quant=p_random_apply1),
    transforms.ToTensor()
]
augmentation2 = [
    transforms.RandomResizedCrop(224, scale=(args.crop_min, 1.)),
    RandomizedQuantizationAugModule(region_num, p_random_apply_rand_quant=p_random_apply2),
    transforms.ToTensor()
]
  1. Call the augmentation in forward(). This is faster than above two usages since the augmentation is deployed on GPUs.
# https://github.com/facebookresearch/moco-v3/blob/c349e6e24f40d3fedb22d973f92defa4cedf37a7/moco/builder.py#L35
region_num = 8
self.rand_quant_layer = RandomizedQuantizationAugModule(region_num)
# https://github.com/facebookresearch/moco-v3/blob/c349e6e24f40d3fedb22d973f92defa4cedf37a7/moco/builder.py#L86-L94
q1 = self.predictor(self.base_encoder(self.rand_quant_layer(x1)))
q2 = self.predictor(self.base_encoder(self.rand_quant_layer(x2)))

with torch.no_grad():  # no gradient
    self._update_momentum_encoder(m)  # update the momentum encoder

    # compute momentum features as targets
    k1 = self.momentum_encoder(self.rand_quant_layer(x1))
    k2 = self.momentum_encoder(self.rand_quant_layer(x2))

Citation

@inproceedings{wu2023randomized,
  title={Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning},
  author={Wu, Huimin and Lei, Chenyang and Sun, Xiao and Wang, Peng-Shuai and Chen, Qifeng and Cheng, Kwang-Ting and Lin, Stephen and Wu, Zhirong},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={16305--16316},
  year={2023}
}

@Article{wu2023randomized,
  author={Huimin Wu and Chenyang Lei and Xiao Sun and Peng-Shuai Wang and Qifeng Chen and Kwang-Ting Cheng and Stephen Lin and Zhirong Wu},
  journal = {arXiv:2212.08663},
  title={Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning}, 
  year={2023},
}

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

About

a novel data augmentation method across data modalities

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages