You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for sharing the good work. I have a couple questions about the constant TOKEN_SELF_ATTN_VALUE and how it is used.
TOKEN_SELF_ATTN_VALUE is first defined here in reformer_pytorch.py with a comment saying "carefully set for half precision to work". Later, it's used in LSHAttention and FullQKAttention to mask out attention to self except when no other targets are available.
How is the value of -5e4 decided?
Why does the QK attention require tokens not attend to self?
Thank you!
The text was updated successfully, but these errors were encountered:
Thanks for sharing the good work. I have a couple questions about the constant
TOKEN_SELF_ATTN_VALUE
and how it is used.TOKEN_SELF_ATTN_VALUE
is first defined here in reformer_pytorch.py with a comment saying "carefully set for half precision to work". Later, it's used inLSHAttention
andFullQKAttention
to mask out attention to self except when no other targets are available.-5e4
decided?Thank you!
The text was updated successfully, but these errors were encountered: