Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pure language model #7

Open
XiaoLiuAI opened this issue Aug 22, 2018 · 1 comment
Open

pure language model #7

XiaoLiuAI opened this issue Aug 22, 2018 · 1 comment

Comments

@XiaoLiuAI
Copy link

Hello, inspired by openai/finetune-transformer-lm, I am now trying to make a language model based on your code. I got a question during implementation.

self.model = Model([src_seq_input, tgt_seq_input], loss)
self.model.add_loss([loss])
self.model.compile(optimizer, None)

Why don't you add the loss function through compile api? I am not quite sure about the effect of api add_loss.

By the way, I made a language model encoder based on your Encoder, but I added GetSubMask as you did in Decoder. Then I would like to add a crf layer after the encoder (for sequence labelling, while openAi's model is for text classification). Finally, train the model based on the language model loss + crf loss. Do you have any implementation suggestion? Especially any idea for verifying the correctness of the code...

I saw you example data about pinyin and Chinese, are you Chinese?

@XiaoLiuAI
Copy link
Author

XiaoLiuAI commented Aug 22, 2018

My current implementation is

class TransformerEncoderCrf(Wrapper):
    def __init__(self, config):
        self.len_limit = config.len_limit
        self.load_model(config)

    def get_pos_seq(self, x):
        mask = K.cast(K.not_equal(x, 0), 'int32')
        pos = K.cumsum(K.ones_like(x, 'int32'), 1)
        return pos * mask # TODO add length limit

    @staticmethod
    def get_loss(args):
        y_pred, y_true = args
        y_true = K.cast(y_true, 'int32')
        loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y_true[:,1:], logits=y_pred[:,:-1])
        mask = K.cast(tf.not_equal(y_true, 0), 'float32')
        loss = tf.reduce_sum(loss * mask, -1) / tf.reduce_sum(mask, -1) # batch_size * 1
        loss = K.mean(loss) 
        return loss

    def load_model(self, config):
        nvocab = config.nvocab
        len_limit = config.len_limit
        d_embed = config.d_embed
        share_word_emb = config.share_word_emb
        n_head = config.n_head
        d_k = config.d_k
        d_v = config.d_v
        d_inner_hid = config.d_inner_hid
        n_layers = config.n_layers
        dropout = config.dropout

        if config.pos_trainable:
            pos_emb = Embedding(len_limit, d_embed, trainable=True)
        else:
            pos_emb = Embedding(len_limit, d_embed, trainable=False, weights=[GetPosEncodingMatrix(len_limit, d_embed)])

        word_emb = Embedding(nvocab, d_embed)

        self.encoder = FullSeqEncoder(d_embed, d_inner_hid, n_head, d_k, d_v, n_layers, dropout,
                                     word_emb=word_emb, pos_emb=pos_emb)

        self.tok_input = Input(shape=(None,), dtype='int32')
        self.tok_output = Input(shape=(None,), dtype='int32')

        self.position_input = Lambda(self.get_pos_seq)(self.tok_input)

        enc_output = self.encoder(self.tok_input, self.position_input)
        self.encoder_model = Model(inputs=self.tok_input, outputs=enc_output)  # for possible pre-training

        lm_output = TimeDistributed(TiedEmbeddingsTransposed(tied_to=word_emb))(enc_output)
        lm_loss = Lambda(self.get_loss)([lm_output, self.tok_input])

        if config.use_crf:
            fully_connected_layer = TimeDistributed(Dense(config.num_fully_connect, activation='tanh'))
            crf_layer = CRF(config.ntags, sparse_target=False)
            ner_output = crf_layer(fully_connected_layer(self.encoder_model))

            self.loss = [crf_layer.loss_function, lm_loss]
            self.metrics = crf_layer.accuracy

            self.model = Model(inputs=self.tok_input, outputs=ner_output)
        else:
            # TO BE DONE
            output_layer = TimeDistributed(Dense(config.ntags, activation='softmax'))
            self.model = Model(inputs=self.tok_input, outputs=output_layer)
            self.loss = 'categorical_crossentropy'
            self.metrics = 'accuracy'

        print(self.model.summary())

    def compile(self, *args, **kwargs):
        # TO BE DONE
        if 'metrics' in kwargs:
            kwargs['metrics'].append(self.metrics)
        else:
            kwargs['metrics'] = [self.metrics]
        self.model.compile(*args, loss=self.loss, **kwargs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant