Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Would you please add the attention or pointer mechanism based on your current model? #13

Open
Imorton-zd opened this issue Mar 20, 2018 · 2 comments

Comments

@Imorton-zd
Copy link

Imorton-zd commented Mar 20, 2018

Thanks for your git, which gives me a lot of inspiration. To my best knowledge, the attention or pointer mechanism is popular in sequence to sequence tasks such as chatbot. I have read the attention mechanism of Luong et al. 2015 and Bahdanau et al. 2015, pointer networks of some summarization tasks, but I feel confused on those formulas. Would you please add some attention or pointer mechanism examples based on your current model?

@oswaldoludwig
Copy link
Owner

oswaldoludwig commented Mar 27, 2018

This is a space for issues, which is not the case here. However, I will keep this text because it touches on an important point. Attention is important in the seq2seq modeling, since it relaxes the constraint of encoding sentences of different lengths in a fixed-dimension thought vector. However, this feature usually only improves performance when you have a long span context, such as in the case of text summarization or translation of set of sentences. This is not the case of the seq2seq chatbots, in fact I developed this seq2seq model in Keras because I tried to create a chatbot using the seq2seq models available in Tensorflow for Machine Translation (with attention) and the result was below my expectation. I can guarantee that you cannot get a better result with this small dataset using any other model (I refer to our model that uses the discriminator of our GAN training method to choose the best answer, the second option in this git, i.e. conversation_discriminator.py).

@Imorton-zd
Copy link
Author

Thanks for your suggestions. In fact, I have tried the attention mechanism to question generation with about 50K Q&A pairs ( The average length of answers is about 50 and the average length of questions is about 20). However, the approach with attention performs worse than the simple seq2seq approach. At first, I think my implementation using keras made a mistake. After all, using keras to implement attention is rather troublesome, unless implementing a custom layer. Based on the above reasons, I think if I can refer to others' implementation, I can judge whether it is the problem of attention or my implementation via keras. Anyway, thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants