Positive-Negative-Momentum

The official PyTorch Implementations of Positive-Negative Momentum Optimizers.

The algortihms are proposed in our paper: Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization, which is accepted by ICML 2021. In the updated arxiv version, we fixed several notation typos that appeared in the ICML version due to the notation conflicts.

Why Positive-Negative Momentum?

It is well-known that stochastic gradient noise matters a lot to generalization. The Positive-Negative Momentum (PNM) approach, which is a powerful alternative to conventional Momentum in classic optimizers, can manipulate stochastic gradient noise by adjusting the extrahyperparameter.

The environment is as bellow:

Python 3.7.3

PyTorch >= 1.4.0

Usage

#You may use it as a standard PyTorch optimizer.

from pnm_optim import *

PNM_optimizer = PNM(net.parameters(), lr=lr, betas=(0.9, 1.), weight_decay=weight_decay)
AdaPNM_optimizer = AdaPNM(net.parameters(), lr=lr, betas=(0.9, 0.999, 1.), eps=1e-08, weight_decay=weight_decay)

Test performance

PNM versus conventional Momentum. We report the mean and the standard deviations (as the subscripts) of the optimal test errors computed over three runs of each experiment. The proposed PNM-based methods show significantly better generalization than conventional momentum-based methods. Particularly, as the theoretical analysis indicates, Stochastic PNM indeed consistently outperforms the conventional baseline, SGD.

Dataset	Model	PNM	AdaPNM	SGD M	Adam	AMSGrad	AdamW	AdaBound	Padam	Yogi	RAdam
CIFAR-10	ResNet18	4.48_0.09	4.94_0.05	5.01_0.03	6.53_0.03	6.16_0.18	5.08_0.07	5.65_0.08	5.12_0.04	5.87_0.12	6.01_0.10
	VGG16	6.26_0.05	5.99_0.11	6.42_0.02	7.31_0.25	7.14_0.14	6.48_0.13	6.76_0.12	6.15_0.06	6.90_0.22	6.56_0.04
CIFAR-100	ResNet34	20.59_0.29	20.41_0.18	21.52_0.37	27.16_0.55	25.53_0.19	22.99_0.40	22.87_0.13	22.72_0.10	23.57_0.12	24.41_0.40
	DenseNet121	19.76_0.28	20.68_0.11	19.81_0.33	25.11_0.15	24.43_0.09	21.55_0.14	22.69_0.15	21.10_0.23	22.15_0.36	22.27_0.22
	GoogLeNet	20.38_0.31	20.26_0.21	21.21_0.29	26.12_0.33	25.53_0.17	21.29_0.17	23.18_0.31	21.82_0.17	24.24_0.16	22.23_0.15

Citing

If you use Positive-Negative Momentum in your work, please cite

@InProceedings{xie2021positive,
  title = 	 {Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization},
  author =       {Xie, Zeke and Yuan, Li and Zhu, Zhanxing and Sugiyama, Masashi},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {11448--11458},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
}

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
Figure		Figure
model		model
pnm_optim		pnm_optim
LICENSE		LICENSE
README.md		README.md
pnm_cifar10_demo.ipynb		pnm_cifar10_demo.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Figure

Figure

model

model

pnm_optim

pnm_optim

LICENSE

LICENSE

README.md

README.md

pnm_cifar10_demo.ipynb

pnm_cifar10_demo.ipynb

Repository files navigation

Positive-Negative-Momentum

Why Positive-Negative Momentum?

The environment is as bellow:

Usage

Test performance

Citing

About

Releases

Packages

Languages

License

zeke-xie/Positive-Negative-Momentum

Folders and files

Latest commit

History

Repository files navigation

Positive-Negative-Momentum

Why Positive-Negative Momentum?

The environment is as bellow:

Usage

Test performance

Citing

About

Topics

Resources

License

Stars

Watchers

Forks

Languages