mask zero and activation in HATT #28

zhangsh950618 · 2018-08-20T16:24:59Z

we use this code to build our project, but we found the acc dropped. So , we review the code, and find the following issues.

This code did not implemented "mask" in the "AttLayer" class.
we believe "Dense layer" should implemented in the class "AttLayer", instead of using "Dense" out of the class
lost "Activation function" in the Dense layer

We made the above changes，and the acc increased by 4-5 percent from baseline in out task(text classification).

we give our "AttLayer" class, this input is the direct output from the GRU without an additional "Dense layer":

class AttLayer(Layer):
    def __init__(self, attention_dim):
        self.init = initializers.get('normal')
        self.supports_masking = True
        self.attention_dim = attention_dim
        super(AttLayer, self).__init__()

    def build(self, input_shape):
        assert len(input_shape) == 3
        self.W = K.variable(self.init((input_shape[-1], self.attention_dim)))
        self.b = K.variable(self.init((self.attention_dim, )))
        self.u = K.variable(self.init((self.attention_dim, 1)))
        self.trainable_weights = [self.W, self.b, self.u]
        super(AttLayer, self).build(input_shape)

    def compute_mask(self, inputs, mask=None):
        return mask

    def call(self, x, mask=None):
        # size of x :[batch_size, sel_len, attention_dim]
        # size of u :[batch_size, attention_dim]
        # uit = tanh(xW+b)
        uit = K.tanh(K.bias_add(K.dot(x, self.W), self.b))
        ait = K.dot(uit, self.u)
        ait = K.squeeze(ait, -1)

        ait = K.exp(ait)

        if mask is not None:
            # Cast the mask to floatX to avoid float64 upcasting in theano
            ait *= K.cast(mask, K.floatx())
        ait /= K.cast(K.sum(ait, axis=1, keepdims=True) + K.epsilon(), K.floatx())
        ait = K.expand_dims(ait)
        weighted_input = x * ait
        output = K.sum(weighted_input, axis=1)

        return output

    def compute_output_shape(self, input_shape):
        return (input_shape[0], input_shape[-1])

richliao · 2018-08-20T18:39:35Z

Thanks for the implementation!

The issues (2&3) you mentioned are also covered here in issue #24 .

Can you do a pull & push so that everyone can benefit?

alejandrods · 2018-08-21T12:00:22Z

Hi, I have implemented the new Attention layer but I get a error:

`
File "D:/Hierachical_2_imbd.py", line 227, in call
uit = K.tanh(K.bias_add(K.dot(x, self.w), self.b))

AttributeError: module 'keras.backend' has no attribute 'bias_add'`

Can someone help me?

zhangsh950618 · 2018-08-21T12:05:42Z

I hava push a new verison of this implements, and you can review full code in my repo

alejandrods · 2018-08-21T12:09:47Z

Thanks! I will check it

alejandrods · 2018-08-21T12:27:55Z

How can I implement to derive the attention weight and identify important words for the classification?? I have read in your post the last update, but I don't understand your approach

richliao · 2018-08-21T15:17:29Z

It's not a fixed weight. Don't confused with the context vector or weights learned in the attention layer. You need to do a forward pass to derive importance of sentences and words. Different sentences and words will get to different result. Please read the paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mask zero and activation in HATT #28

mask zero and activation in HATT #28

zhangsh950618 commented Aug 20, 2018 •

edited

richliao commented Aug 20, 2018

alejandrods commented Aug 21, 2018

zhangsh950618 commented Aug 21, 2018

alejandrods commented Aug 21, 2018

alejandrods commented Aug 21, 2018

richliao commented Aug 21, 2018

mask zero and activation in HATT #28

mask zero and activation in HATT #28

Comments

zhangsh950618 commented Aug 20, 2018 • edited

richliao commented Aug 20, 2018

alejandrods commented Aug 21, 2018

zhangsh950618 commented Aug 21, 2018

alejandrods commented Aug 21, 2018

alejandrods commented Aug 21, 2018

richliao commented Aug 21, 2018

zhangsh950618 commented Aug 20, 2018 •

edited