K-FAC_pytorch

Pytorch implementation of K-FAC and E-KFAC. (Only support single-GPU training, need modifications for multi-GPU.)

Requiresments

pytorch 0.4.0
torchvision
python 3.6.0
tqdm
tensorboardX
tensorflow

How to run

python main.py --dataset cifar10 --optimizer kfac --network vgg16_bn  --epoch 100 --milestone 40,80 --learning_rate 0.01 --damping 0.03 --weight_decay 0.003

Performance

Note: for better hyparameters of K-FAC, please refer to weight_decay repo. (The hyparameters below are not good enough! Especially the weight decay is too small!)

For K-FAC and E-KFAC, the search range of learning rates, weight decay and dampings are:
(1) learning rate = [3e-2, 1e-2, 3e-3]
(2) weight decay = [1e-2, 3e-3, 1e-3, 3e-4, 1e-4]
(3) damping = [3e-2, 1e-3, 3e-3]

For SGD:
(1) learning rate = [3e-1, 1e-1, 3e-2]
(2) weight decay = [1e-2, 3e-3, 1e-3, 3e-4, 1e-4]

CIFAR10

Optimizer	Model	Acc.	learning rate	weight decay	damping
KFAC	VGG16_BN	93.86%	0.01	0.003	0.03
E-KFAC	VGG16_BN	94.00%	0.003	0.01	0.03
SGD	VGG16_BN	94.03%	0.03	0.001	-
KFAC	ResNet110	93.59%	0.01	0.003	0.03
E-KFAC	ResNet110	93.37%	0.003	0.01	0.03
SGD	ResNet110	94.14%	0.03	0.001	-

CIFAR100

Optimizer	Model	Acc.	learning rate	weight decay	damping
KFAC	VGG16_BN	74.09%	0.003	0.01	0.03
E-KFAC	VGG16_BN	73.20%	0.01	0.01	0.03
SGD	VGG16_BN	74.56%	0.03	0.003	-
KFAC	ResNet110	72.71%	0.003	0.01	0.003
E-KFAC	ResNet110	72.32%	0.03	0.001	0.03
SGD	ResNet110	72.60%	0.1	0.0003	-

Others

Please consider cite the following papers for K-FAC:

@inproceedings{martens2015optimizing,
  title={Optimizing neural networks with kronecker-factored approximate curvature},
  author={Martens, James and Grosse, Roger},
  booktitle={International conference on machine learning},
  pages={2408--2417},
  year={2015}
}

@inproceedings{grosse2016kronecker,
  title={A kronecker-factored approximate fisher matrix for convolution layers},
  author={Grosse, Roger and Martens, James},
  booktitle={International Conference on Machine Learning},
  pages={573--582},
  year={2016}
}

and for E-KFAC:

@inproceedings{george2018fast,
  title={Fast Approximate Natural Gradient Descent in a Kronecker Factored Eigenbasis},
  author={George, Thomas and Laurent, C{\'e}sar and Bouthillier, Xavier and Ballas, Nicolas and Vincent, Pascal},
  booktitle={Advances in Neural Information Processing Systems},
  pages={9550--9560},
  year={2018}
}

If you have any questions or suggestions, please feel free to contact me via alecwangcq at gmail , com!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
models		models
optimizers		optimizers
utils		utils
.gitignore		.gitignore
README.md		README.md
main.py		main.py
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

models

models

optimizers

optimizers

utils

utils

.gitignore

.gitignore

README.md

README.md

main.py

main.py

trainer.py

trainer.py

Repository files navigation

K-FAC_pytorch

Requiresments

How to run

Performance

Note: for better hyparameters of K-FAC, please refer to weight_decay repo. (The hyparameters below are not good enough! Especially the weight decay is too small!)

CIFAR10

CIFAR100

Others

About

Releases

Packages

Languages

alecwangcq/KFAC-Pytorch

Folders and files

Latest commit

History

Repository files navigation

K-FAC_pytorch

Requiresments

How to run

Performance

Note: for better hyparameters of K-FAC, please refer to weight_decay repo. (The hyparameters below are not good enough! Especially the weight decay is too small!)

CIFAR10

CIFAR100

Others

About

Topics

Resources

Stars

Watchers

Forks

Languages