add Kaiming He initialization, fixed Xavier initialization #311

alper111 · 2018-06-06T00:51:36Z

Xavier initialization is x ~ U( -sqrt( 6.0 / (fan_in + fan_out)), +sqrt( 6.0 / (fan_in + fan_out))),
or x ~ N(mean = 0, std = sqrt( 2.0 / (fan_in+fan_out))).

Kaiming initialization is x ~ U( -sqrt( 3.0 / fan_in), +sqrt( 3.0 / fan_in)),
or x ~ N(mean = 0, std = sqrt( 1.0 / fan_in)).

ozanarkancan · 2018-06-06T16:41:19Z

src/distributions.jl

@@ -37,7 +37,7 @@ function xavier(a...)
        fanout = size(w, ndims(w))
        fanin = div(length(w), fanout)
    end
-    s = convert(eltype(w), sqrt(2 / (fanin + fanout)))
+    s = convert(eltype(w), sqrt(6 / (fanin + fanout)))


I think, our version is specialized for conv layers with relu activation. The part you changed is called as gain. You may want to update your pr to allow the xavier function to accept the gain parameter. And its default value can be 6.

To be honest, I barely know the theoretical background. I guess you are referring to "Delving Deep into Rectifiers" paper when you say it is specialized for conv layers with relu activation. In the paper, it states this should hold: n_l * var(w_l) = 2 where n_l is the average number of units per layer. You can check that:
x = xavier(200,300)
(200+300) / 2 * var(x) ~= 0.33, where this value should be 1.0 for Xavier, 2.0 for ReLU activation. I also compared xavier with Tensorflow's equivalent initializer. TF's xavier is ~3 times of xavier, and TF's kaiming (relu specialized xavier) is ~6 times of xavier, consistently.

As for your suggestion I am very new to Julia and I couldn't find a way to edit arguments so that it is compatible with pre-existing models. However there can be another distribution that takes both gain and n as arguments (as in TF).

You can use keyword arguments for options.

xavier(a...; gain = 6)

alper111 · 2018-12-06T14:42:16Z

In the original version of xavier, variance is correctly calculated. This should be the variance of weights so that variance of activations (and of gradients) should remain the same across layers. There is also linear activation assumption in this analysis. However, this calculated variance is the variance of a Gaussian distribution. In the original xavier, weights are drawn from uniform distribution. In order to scale this up correctly, we should scale it by \sqrt(3). This is due to drawing from uniform distribution, nothing to do with activation functions.

If we also want to take activation functions into account, we can change the default gain value (which is 1).
Gain values for different activation functions: https://pytorch.org/docs/stable/_modules/torch/nn/init.html

I change the default Kaiming gain value to sqrt(2) (that is for ReLU activation units) since in the original description this is done in this way. This way of Xavier and Kaiming initializations gives the same variance as in PyTorch.

ekinakyurek · 2018-12-08T01:20:34Z

@ozanarkancan do you think is there a problem in this PR?

ozanarkancan · 2018-12-18T11:48:48Z

@ekinakyurek @denizyuret The branch can be merged, however, changing the initialization method will possibly break the replicability of experiments that use the current implementation. This should be stated in somewhere...

denizyuret · 2018-12-18T18:35:30Z

Is there a way to do this in a backwardly compatible manner? For example by keeping the default arguments consistent with prior implementation.

…

On Tue, Dec 18, 2018 at 6:48 AM Ozan Arkan Can ***@***.***> wrote: @ekinakyurek <https://github.com/ekinakyurek> @denizyuret <https://github.com/denizyuret> The branch can be merged, however, changing the initialization method will possibly break the replicability of experiments that use the current implementation. This should be stated in somewhere... — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#311 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABvNpoTl7P4xVrx7aLV6NfOPUZ7bGZ8Kks5u6NYhgaJpZM4Ub0VV> .

alper111 added 2 commits June 6, 2018 03:32

update xavier initialization

d6f978c

add Kaiming He initialization

1592db5

ozanarkancan reviewed Jun 6, 2018

View reviewed changes

Update distributions.jl

b1d1a8a

CarloLucibello approved these changes Jul 20, 2018

View reviewed changes

denizyuret added the enhancement label Sep 2, 2018

ekinakyurek requested a review from denizyuret September 23, 2018 17:05

refactorization

b66da46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add Kaiming He initialization, fixed Xavier initialization #311

add Kaiming He initialization, fixed Xavier initialization #311

alper111 commented Jun 6, 2018

ozanarkancan Jun 6, 2018

alper111 Jun 12, 2018

ozanarkancan Jun 12, 2018

alper111 commented Dec 6, 2018

ekinakyurek commented Dec 8, 2018

ozanarkancan commented Dec 18, 2018

denizyuret commented Dec 18, 2018 via email

add Kaiming He initialization, fixed Xavier initialization #311

Are you sure you want to change the base?

add Kaiming He initialization, fixed Xavier initialization #311

Conversation

alper111 commented Jun 6, 2018

ozanarkancan Jun 6, 2018

Choose a reason for hiding this comment

alper111 Jun 12, 2018

Choose a reason for hiding this comment

ozanarkancan Jun 12, 2018

Choose a reason for hiding this comment

alper111 commented Dec 6, 2018

ekinakyurek commented Dec 8, 2018

ozanarkancan commented Dec 18, 2018

denizyuret commented Dec 18, 2018 via email