TOKEN_SELF_ATTN_VALUE and QK attention #134

lcmeng · 2021-03-12T01:02:45Z

Thanks for sharing the good work. I have a couple questions about the constant TOKEN_SELF_ATTN_VALUE and how it is used.

TOKEN_SELF_ATTN_VALUE is first defined here in reformer_pytorch.py with a comment saying "carefully set for half precision to work". Later, it's used in LSHAttention and FullQKAttention to mask out attention to self except when no other targets are available.

How is the value of -5e4 decided?
Why does the QK attention require tokens not attend to self?

Thank you!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TOKEN_SELF_ATTN_VALUE and QK attention #134

TOKEN_SELF_ATTN_VALUE and QK attention #134

lcmeng commented Mar 12, 2021

TOKEN_SELF_ATTN_VALUE and QK attention #134

TOKEN_SELF_ATTN_VALUE and QK attention #134

Comments

lcmeng commented Mar 12, 2021