'nan' loss function when using layer normalization #13

McKracken · 2018-12-05T20:02:55Z

Hi,

I was using only the LayerNormalization from your code in mine. I didn't change anything from the code, apart from overriding the compute_mask function, as my input is an Embedding with mask_zero=True

Code

class LayerNormalization(Layer):

    def __init__(self, eps=1e-6, **kwargs):
        self.eps = eps
        super(LayerNormalization, self).__init__(**kwargs)

    def build(self, input_shape):
        self.gamma = self.add_weight(name='gamma', shape=input_shape[-1:],
                                     initializer=Ones(), trainable=True)
        self.beta = self.add_weight(name='beta', shape=input_shape[-1:],
                                    initializer=Zeros(), trainable=True)
        super(LayerNormalization, self).build(input_shape)

    def call(self, x):
        mean = K.mean(x, axis=-1, keepdims=True)
        std = K.std(x, axis=-1, keepdims=True)
        return self.gamma * (x - mean) / (std + self.eps) + self.beta

    def compute_output_shape(self, input_shape):
        return input_shape

    def compute_mask(self, inputs, input_mask=None):
        return input_mask

but strangely I get all nan for all the measurements I do while training and tuning (loss function and others). I tried using other implementations of the LayerNormalization layer (e.g. https://github.com/CyberZHG/keras-layer-normalization), and everything works without problem. I was wondering whether you have any clue about that.

The text was updated successfully, but these errors were encountered:

lsdefine · 2019-07-13T01:17:36Z

CyberZHG's

variance = K.mean(K.square(inputs - mean), axis=-1, keepdims=True)
std = K.sqrt(variance + self.epsilon)

My

std = K.std(x, axis=-1, keepdims=True)

I think maybe there are input sequences with length 0, and the whole sequence is mask.
But you can safely use his LayerNormalization.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'nan' loss function when using layer normalization #13

'nan' loss function when using layer normalization #13

McKracken commented Dec 5, 2018 •

edited

lsdefine commented Jul 13, 2019

'nan' loss function when using layer normalization #13

'nan' loss function when using layer normalization #13

Comments

McKracken commented Dec 5, 2018 • edited

lsdefine commented Jul 13, 2019

McKracken commented Dec 5, 2018 •

edited