Attention mask for computation of replace and append operation #22

rivaldinho123 · 2020-09-21T07:29:31Z

Hi, you mentioned in the papar that we calculate r_{i}^{l} over h_{j}^{l} for all j except i, but calculate a_{i}^{l} over h_{j}^{l} for all j including i.
Why there is such a difference that we can't have information about the current token x_{i} when dealing with the replace operation but have access to the current token for append operation on the contrary?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attention mask for computation of replace and append operation #22

Attention mask for computation of replace and append operation #22

rivaldinho123 commented Sep 21, 2020

Attention mask for computation of replace and append operation #22

Attention mask for computation of replace and append operation #22

Comments

rivaldinho123 commented Sep 21, 2020