This is quick evaluation of different architectures functions performance on ImageNet-2012.

The architecture is similar to common ones for ImageNet, but has differences:

Images are resized to small side = 128 for speed reasons.
Networks are initialized with LSUV-init

ResNet attempts are moved to ResNets.md

Architectures

CaffeNet only

Name	Accuracy	LogLoss	Comments
CaffeNet256	0.565	1.87	Reference BVLC model, LSUV init
CaffeNet128	0.471	2.36	Pool5 = 3x3
CaffeNet128_4096	0.497	2.24	Pool5 = 3x3, fc6-fc7=4096
CaffeNet128All	0.530	2.05	All improvements without caffenet arch change: ELU + SPP + color_trans3-10-3 + Nesterov+ (AVE+MAX) Pool + linear lr_policy
	+ 0.06		Gain over vanilla caffenet128. "Sum of gains" = 0.018 + 0.013 + 0.015 + 0.003 + 0.013 + 0.023 = 0.085
SqueezeNet128	0.530	2.08	Reference solver, but linear lr_policy and batch_size=256 (320K iters). WITHOUT tricks like ELU, SPP, AVE+MAX, etc.
SqueezeNet128	0.547	2.08	New SqueezeNet solver. WITHOUT tricks like ELU, SPP, AVE+MAX, etc.
SqueezeNet224	0.592	1.80	New SqueezeNet solver. WITHOUT tricks like ELU, SPP, AVE+MAX, etc., 2 GPU
SqueezeNet128+ELU	0.555	1.95	Reference solver, but linear lr_policy and batch_size=256 (320K iters).ELU
CaffeNet256All	0.613	1.64	All improvements without caffenet arch change: ELU + SPP + color_trans3-10-3 + Nesterov+ (AVE+MAX) Pool + linear lr_policy
CaffeNet128, no pad	0.411	2.70	No padding, but conv1 stride=2 instead of 4 to keep size of pool5 the same
CaffeNet128, dropout in conv	0.426	2.60	Dropout before pool2=0.1, after conv3 = 0.1, after conv4 = 0.2
CaffeNet128SPP	0.483	2.30	SPP= 3x3 + 2x2 + 1x1
DarkNet128BN	0.502	2.25	16C3->MP2->32C3->MP2->64C3->MP2->128C3->MP2->256C3->MP2->512C3->MP2->1024C3->1000CLF.BN
			+ PreLU + base_lr=0.035, exp lr_policy, 160K iters
CaffeNet128, no group conv	0.487	2.26	Plain convolution instead group one
NiN128	0.519	2.15	Step lr_policy. Be carefull to not use dropout on maxpool in-place

Others

Name	Accuracy	LogLoss	Comments
DarkNetBN	0.502	2.25	16C3->MP2->32C3->MP2->64C3->MP2->128C3->MP2->256C3->MP2->512C3->MP2->1024C3->1000CLF.BN
HeNet2x2	0.561	1.88	No SPP, Pool5 = 3x3, VLReLU, J' from paper
HeNet3x1	0.560	1.88	No SPP, Pool5 = 3x3, VLReLU, J' from paper, 2x2->3x1
GoogLeNet128	0.619	1.61	linear lr_policy, batch_size=256. obviously slower than caffenet
GoogLeNet128Res	0.634	1.56	linear lr_policy, batch_size=256. Resudial connections between inception block. No BN
GoogLeNet128Res_color	0.638	1.52	linear lr_policy, batch_size=256. Resudial connections between inception block. No BN. + color_trans3-10-3
googlenet_loss2_clf	0.571	1.80	from net above, aux classifier after inception_4d
googlenet_loss1_clf	0.520	2.06	from net above, aux classifier after inception_4a
GoogLeNet128_BN_after	0.596	1.70	BN After ReLU
[GoogLeNet128_BN_lim0606][https://github.com/lim0606/caffe-googlenet-bn]	0.645	1.54	BN before ReLU + scale bias, linear LR, batch_size = 128, base_lr = 0.005, 640K iter, LSUV init. !!! 5x5 replaced by two 3x3
fitnet1_elu	0.333	3.21
VGGNet16_128	0.651	1.46	Surprisingly much better that GoogLeNet128, even later is with step-based solver.
VGGNet16_128_All	0.682	1.47	ELU (a=0.5. a=1 leads to divergence :( ), avg+max pool, color conversion, linear lr_policy

Prototxt, logs

Architectures tested:

CaffeNet (pool5 size = 3x3)
HeNet Convolutional Neural Networks at Constrained Time Cost. The difference with paper is VLReLU (converges faster at start) and no SPP pooling, instead used "classical" pool5
CaffeNetSPP, single scale training (SPP pool5 = 3x3 + 2x2 + 1x1) Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition]
GoogleNet Going Deeper with Convolutions

Architectures are selected, that their theoretical and/or practical computational complexity ~ caffenet. Currently, holds for all except HeNet, which is slower in practice.

*** Contib Base net here is caffenet+BN+PReLU+dropout=0.2

From contributors

Base net is caffenet+BN+ReLU+drop=0.2 There difference in filters (main, 5x5 -> 3x3 + 3x3 or 1x5+5x1) and solver.

Name	Accuracy	LogLoss	Comments
Base	0.527	2.09
Base_dereyly_lr, noBN, ReLU	0.441	2.53	max_iter=160K, stepsize=2K, gamma=0.915, but default caffenet
Base_dereyly 5x1, noBN, ReLU	0.474	2.31	5x5->1x5+5x1
Base_dereyly_PReLU	0.550	1.93	BN, PreLU + base_lr=0.035, exp lr_policy, 160K iters, 5x5->3x3+3x3
Base_dereyly 3x1	0.553	1.92	PreLU + base_lr=0.035, exp lr_policy, 160K iters, 5x5->1x3+1x3+3x1+1x3
Base_dereyly 3x1 scale aug	0.530	2.04	Same as previous, img: 128 crop from (128...300)px image, test resize to 144, crop 128
Base_dereyly 3x1 scale aug	0.512	2.17	Same as previous, img: 128 crop from (128...300)px image, test resize to (128+300)/2, crop 128
Base_dereyly 3x1->5x1	0.546	1.97*	PreLU + base_lr=0.035, exp lr_policy, 160K iters, 5x5->1x5+1x5+5x1+1x5
Base_dereyly 3x1,halfBN	0.544	1.95	PreLU + base_lr=0.035, exp lr_policy, 160K iters,5x5->1x3+1x3+3x1+1x3, BN only for pool and fc6
Base_dereyly 5x1	0.540	2.00	PreLU + base_lr=0.035, exp lr_policy, 160K iters, 5x5->1x5+5x1
DarkNetBN	0.502	2.25	16C3->MP2->32C3->MP2->64C3->MP2->128C3->MP2->256C3->MP2->512C3->MP2->1024C3->1000CLF.BN
			+ PreLU + base_lr=0.035, exp lr_policy, 160K iters

Prototxt, logs

Name	Accuracy	LogLoss	Comments
VGG-Like	0.521	2.14	1st layer = 7x7 stride 2, unlike VGG. All other layer = 1/2 VGG width
VGG-LikeRes	0.576	1.83	with residual connections, no BN
VGG-LikeResDrop	0.568	1.91	with residual connections, no BN , dropout in conv

Prototxt, logs

The PRs with test are welcomed

P.S. Logs are merged from lots of "save-resume", because were trained at nights, so plot "Accuracy vs. seconds" will give weird results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architectures.md

Architectures.md

Architectures

From contributors

Files

Architectures.md

Latest commit

History

Architectures.md

File metadata and controls

Architectures

From contributors