Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding the configuration of decoder_retention_heads #84

Open
Kratos-Wen opened this issue Nov 30, 2023 · 2 comments
Open

Question regarding the configuration of decoder_retention_heads #84

Kratos-Wen opened this issue Nov 30, 2023 · 2 comments
Assignees

Comments

@Kratos-Wen
Copy link

Thank you for your great work!

I've noticed that your decoder_retention_heads is set to 3 by default, and the mask is also expanded to three dimensions to match. Have you experimented with the performance differences under different numbers of heads? Is this configuration sufficient in terms of attention performance? Since your model is primarily used for sequence models in language processing, I am looking to extend its application to image processing. I'm unsure if I should make any modifications to this aspect.

Thank you in advance for your response.

@jpokemon232
Copy link

when I was adjusting the configurations of Retnet I also ran into this issue. Can you make a assert that the decoder_embed_dim and decoder_value_embed_dim must be a multiple of decoder_retention_heads.

@sunyt32
Copy link
Contributor

sunyt32 commented Dec 28, 2023

@Kratos-Wen decoder_retention_heads affects key_diim, which is recommanded to set as 256.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants