Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'nan' loss function when using layer normalization #13

Open
McKracken opened this issue Dec 5, 2018 · 1 comment
Open

'nan' loss function when using layer normalization #13

McKracken opened this issue Dec 5, 2018 · 1 comment

Comments

@McKracken
Copy link

McKracken commented Dec 5, 2018

Hi,

I was using only the LayerNormalization from your code in mine. I didn't change anything from the code, apart from overriding the compute_mask function, as my input is an Embedding with mask_zero=True

Code

class LayerNormalization(Layer):

    def __init__(self, eps=1e-6, **kwargs):
        self.eps = eps
        super(LayerNormalization, self).__init__(**kwargs)

    def build(self, input_shape):
        self.gamma = self.add_weight(name='gamma', shape=input_shape[-1:],
                                     initializer=Ones(), trainable=True)
        self.beta = self.add_weight(name='beta', shape=input_shape[-1:],
                                    initializer=Zeros(), trainable=True)
        super(LayerNormalization, self).build(input_shape)

    def call(self, x):
        mean = K.mean(x, axis=-1, keepdims=True)
        std = K.std(x, axis=-1, keepdims=True)
        return self.gamma * (x - mean) / (std + self.eps) + self.beta

    def compute_output_shape(self, input_shape):
        return input_shape

    def compute_mask(self, inputs, input_mask=None):
        return input_mask

but strangely I get all nan for all the measurements I do while training and tuning (loss function and others). I tried using other implementations of the LayerNormalization layer (e.g. https://github.com/CyberZHG/keras-layer-normalization), and everything works without problem. I was wondering whether you have any clue about that.

@lsdefine
Copy link
Owner

CyberZHG's

variance = K.mean(K.square(inputs - mean), axis=-1, keepdims=True)
std = K.sqrt(variance + self.epsilon)

My

std = K.std(x, axis=-1, keepdims=True)

I think maybe there are input sequences with length 0, and the whole sequence is mask.
But you can safely use his LayerNormalization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants