Skip to content

Latest commit

 

History

History
56 lines (41 loc) · 4.36 KB

Incorporating Copying Mechanism in Sequence-to-Sequence Learning.md

File metadata and controls

56 lines (41 loc) · 4.36 KB

Chatbot is an active research topic recently. Walking through chatbot papers, I think this paper is one of the most interesting work for this topic. The work can also be applied to other NLP tasks such as Machine Translation, Abstract Summarization.

The general chatbot framework is all about a sequence-to-sequence model (seq2seq). This is straightforward, easier to implement. But of course there are quite lots of space for an improvement. One of the them is about copying. In conversation, we normally have some repeated text as an example below

Helo, my name is chatbot

Nice to meet you, chatbot.

In general, seq2seq aparently does not handle repeated text well. This is actually not surprising, reminding me of a very cool work of Learning to Execute https://arxiv.org/abs/1410.4615. Basically, the idea is to write simple code, and then ask seq2seq to execute the code (a top toy problem but very very interesting one). One interesting thing we learnt from the paper is that seq2seq is not good enough to execute a copy command, in general. For example:

command: print(123456789)

output: 123565756 (I made up the number, but you get the idea).

Also, as the input is longer, the seq2seq can produce less accurate.

How to address that problem? In general, having an external memory (https://arxiv.org/abs/1410.3916) is a very nice solution, I think. But it is just for learning to execute. Conversation is much harder problem, at least with current basic sample codes we want a seq2seq to execute.

The paper proposes a new model (COPYNET) to address the problem. The model is basically the same as seq2seq, but there are several important differences:

  1. The generation of target words is more refined: A word can be generated according to two different modes: generative-mode and copy-mode. The generative-mode is what a seq2seq normaly does (previous state, previous output, fixed-length ending vector, attention-based context vector). The copy-mode is new, where a seq2seq simply picks a word from the source sentence. Each mode is modeled by any neural network, but in the paper the network is quite simple (Eq 7 and 8). In the end, the probability of generating any target word is given by a mixture of probabilities between two modes (Eq. 4). I get the idea, and I quite like it. I am also aware similar solution to other similar problems before. Specifically, in NMT, a content word should generated based on inputs, while a common word should be generated based on the context of target words. There is a work (Context Gates for Neural Machine Translation https://arxiv.org/abs/1608.06043 ) proposing a gated network that integrate the information into seq2seq in a nice way. The later framework (gated network) is a somewhat cleaner solution, I think.

  2. CopyNet also uses information about how previous word is generated (whether it is generated by the copy-mode or not). This makes sense, and would help the model copy a chunk of words better. Here is the way how the information is integrated into CopyNet. Previous output is represented as a concatenation of two vectors: the embedding vector of the word itself, plus the weighted sum of all hidden states in the source side (Eq9). The second part is novel, but Eq9 might seem a bit complicated. Actually, the weighted sum is straightforward, and contains two ingredients: Each state vector representing a specific source word from the source side, plus a weight that denotes how well the previous word is a result of a copy-operation from the specific source word. I think Eq9 is a bit adhoc, and a fancy name of "selective read" is a bit overselling. But I get the idea, and I think it is a nice idea!

Experiments in the paper are solid, and convincing that COPYNET is a very good architecture.

In general, I think this is a nice work, abeit it is a bit hard to walk through. There is space that it can be improved, to name a few:

  • Having a cleaner model. The idea is very nice, but the implementation (e.g. equations) can be improved.
  • I also think COPYNET need to be improved to handle longer chunk. Specifically, CopyNet uses information about how previous word is generated from copy-mode. Instead, I think it can be improved by using information about how the first word in a consecutive words that we believe they are generated by copy-mode.