Maximum Variation Averaging

This repository contains the implementation of the so-called Maximum Variation Averaging (MaxVA) proposed in our paper, Adaptive Learning Rates with Maximum Variation Averaging . MaxVA aims to stabilize the adaptive step size of Adam-like optimizers by adopting an adaptive weighted average of the squared gradients, where the coordinate-wise weights are chosen to maximize the estimated gradient variance. In this repository, we provide its implementation with PyTorch on synthetic datasets, image classification, Neural Machine Translation and Natural Language Understanding tasks, as mentioned in the experiment section of our paper.

Usage

We used PyTorch v1.4.0 for the experiments. We have divided the experiments into 4 folders:

synthetic_data: You could run nonconvex.py or nqm.py to reproduce the experiments for the nonconvex function or the Noisy Quadratic Model.

image_classification: Please refer to launch.sh to launch the experiments on CIFAR10 and CIFAR100. For ImageNet, we provide our implementation for large-batch training, which is able to achieve similar performance as reported in LAMB. You could also plug the same optimizers into the PyTorch official example code and refer to the hyper-parameters in the paper.

nmt_nlu: Please first enter the nmt_nlu directory and then run pip install --editable .. For Neural Machine Translation, please first follow the steps to download and process the data, and then refer to run-iwslt-lamadam-tristage.sh to train a transformer with our optimizers from scratch. For the GLUE benchmark, again, first follow the steps to prepare the data, download a RoBERTa-base model and put it under nmt_nlu/roberta-pretrained, and use run-glue-base.sh to fine-tune a RoBERTa-base model on the GLUE tasks.

bert_pt: We provide the implementation of MAdam for large-batch pretraining of BERT, which integrates gradient clipping by default and is compatible with Nvidia's BERT pretraining code.

Citation

Please cite as

@inproceedings{zhu2020maxva,
  title = {Adaptive Learning Rates with Maximum Variation Averaging},
  author = {Zhu, Chen and Cheng, Yu and Gan, Zhe and Huang, Furong and Liu, Jingjing and Goldstein, Tom},
  booktitle = {arXiv: 2006.11918},
  year = {2020},
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
bert_pt		bert_pt
image_classification		image_classification
nmt_nlu		nmt_nlu
synthetic_data		synthetic_data
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bert_pt

bert_pt

image_classification

image_classification

nmt_nlu

nmt_nlu

synthetic_data

synthetic_data

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Maximum Variation Averaging

Usage

Citation

About

Releases

Packages

Languages

zhuchen03/MaxVA

Folders and files

Latest commit

History

Repository files navigation

Maximum Variation Averaging

Usage

Citation

About

Resources

Stars

Watchers

Forks

Languages