Skip to content

rationalspark/NAMSG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ARSG is an efficient method for training neural networks. The acronym is derived from the adaptive remote stochastic gradient method. ARSG yields $O(1/\sqrt{T})$ convergence rate in non-convex settings, that can be further improved to $O(\log(T)/T)$ in strongly convex settings. Numerical experiments demonstrate that ARSG achieves both faster convergence and better generalization, compared with popular adaptive methods, such as ADAM, NADAM, AMSGRAD, and RANGER for the tested problems. In training logistic regression on MNIST and Resnet-20 on CIFAR10 with fixed optimal hyper-parameters obtained by grid search, ARSG roughly halves the computation compared with ADAM. For training ResNet-50 on ImageNet, ARSG outperforms ADAM in convergence speed and meanwhile it surpasses SGD in generalization.

The paper is available at https://arxiv.org/abs/1905.01422.

NAMSG is a former name of ARSG.

The file "supplementary materials.pdf" may not be downloaded or previewed since the platform is instable. It can be obtained by downloading or cloning the repository.

About

ARSG: an efficient first-order adaptive method for training neural networks

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published