Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TOKEN_SELF_ATTN_VALUE and QK attention #134

Open
lcmeng opened this issue Mar 12, 2021 · 0 comments
Open

TOKEN_SELF_ATTN_VALUE and QK attention #134

lcmeng opened this issue Mar 12, 2021 · 0 comments

Comments

@lcmeng
Copy link

lcmeng commented Mar 12, 2021

Thanks for sharing the good work. I have a couple questions about the constant TOKEN_SELF_ATTN_VALUE and how it is used.

TOKEN_SELF_ATTN_VALUE is first defined here in reformer_pytorch.py with a comment saying "carefully set for half precision to work". Later, it's used in LSHAttention and FullQKAttention to mask out attention to self except when no other targets are available.

  • How is the value of -5e4 decided?
  • Why does the QK attention require tokens not attend to self?

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant