GitHub - rationalspark/NAMSG: ARSG: an efficient first-order adaptive method for training neural networks

ARSG is an efficient method for training neural networks. The acronym is derived from the adaptive remote stochastic gradient method. ARSG yields $O(1/\sqrt{T})$ convergence rate in non-convex settings, that can be further improved to $O(\log(T)/T)$ in strongly convex settings. Numerical experiments demonstrate that ARSG achieves both faster convergence and better generalization, compared with popular adaptive methods, such as ADAM, NADAM, AMSGRAD, and RANGER for the tested problems. In training logistic regression on MNIST and Resnet-20 on CIFAR10 with fixed optimal hyper-parameters obtained by grid search, ARSG roughly halves the computation compared with ADAM. For training ResNet-50 on ImageNet, ARSG outperforms ADAM in convergence speed and meanwhile it surpasses SGD in generalization.

The paper is available at https://arxiv.org/abs/1905.01422.

NAMSG is a former name of ARSG.

The file "supplementary materials.pdf" may not be downloaded or previewed since the platform is instable. It can be obtained by downloading or cloning the repository.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
mxnet		mxnet
torch		torch
LICENSE		LICENSE
README.md		README.md
supplementary materials.pdf		supplementary materials.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mxnet

mxnet

torch

torch

LICENSE

LICENSE

README.md

README.md

supplementary materials.pdf

supplementary materials.pdf

Repository files navigation

About

Releases

Packages

Languages

License

rationalspark/NAMSG

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages