Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Masked attention #141

Open
lethienhoa opened this issue May 9, 2018 · 4 comments
Open

Masked attention #141

lethienhoa opened this issue May 9, 2018 · 4 comments
Assignees

Comments

@lethienhoa
Copy link

Hi,
I see that this implementation is lacking masked attention on encoder. Input_lengths should be passed to decoder (not just encoder) in order to compute this. OpenNMT already provided this in function sequence_mask.
Best,

@erogol
Copy link

erogol commented Jul 10, 2018

@lethienhoa why do you need masked attention, if you mask the loss ?

@valtsblukis
Copy link

I just noticed the same thing and landed here. The attention mechanism should only include those encoder outputs in the weighted sum that correspond to valid tokens in the input sequences. For example, if your input lengths in the batch are 23, 12, 7. Then for the third element in the batch, the attention should compute the weighted sum over the 7 encoder outputs, rather than all 23.

Normally your attention would learn to ignore the extra encoder outputs anyway, but this might pose a problem if you train and test with different maximum sentence sizes.

@erogol
Copy link

erogol commented Jul 12, 2018

@valtsblukis thx for explaining it. Yes that was my understading too, but I'd also assume the model would learn it anyways. I am also performing an experiment with my model with/without masking too see the difference.

@pskrunner14
Copy link

@lethienhoa I'll see to it. Thanks for pointing this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants