Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SELU weight init #11

Open
aman-tiwari opened this issue Jul 24, 2017 · 3 comments
Open

SELU weight init #11

aman-tiwari opened this issue Jul 24, 2017 · 3 comments

Comments

@aman-tiwari
Copy link

aman-tiwari commented Jul 24, 2017

Shouldn't the weight initialization for SELU be something like:

def selu_weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        m.weight.data.normal_(0.0, 0.5 / math.sqrt(m.weight.numel()))

    elif classname.find('BatchNorm') != -1:
        size = m.weight.size()
        fan_out = size[0] # number of rows
        fan_in = size[1] # number of columns

        m.weight.data.normal_(0.0, 1.0 / math.sqrt(fan_in))
        # Estimated mean, must be around 0
        m.bias.data.fill_(0)

(The 0.5 factor for the conv. coming from reading the PyTorch forums about what worked for someone, in other places 1.0 is used)

@AlexiaJM
Copy link
Owner

Thanks for raising this point! You're right, it's said in the paper on page 6 (https://arxiv.org/pdf/1706.02515.pdf) that one should use var = 1/n_weights. It's still an attracting point so not initializing correctly can still work but yes this should be modified.

If someone can try it and see if things still work well (using the 1/n_weights first, then .50/n_weights second if that doesn't work well), I'll accept a pull request. Otherwise, I'll try it eventually when I have the time and I'll change it.

@aman-tiwari
Copy link
Author

aman-tiwari commented Jul 24, 2017

I'm about to try it now. You're right that it says to use Var = 1/n_weights but in the official TF implementation they released they use Var = 1/n_neurons_in : https://github.com/bioinf-jku/SNNs/blob/master/selu.py#L31 , not sure which one will work better but hopefully will have some results to find out 🤔

@AlexiaJM
Copy link
Owner

Just saw this earlier today: http://cs231n.github.io/neural-networks-2/#init. It shows why use fan_in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants