Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any reason why you didn't use the EVA02 architecture for EVA-CLIP models larger than 4B? #144

Closed
stevenliu000 opened this issue Mar 13, 2024 · 1 comment

Comments

@stevenliu000
Copy link

The EVA02 paper introduces several architectural improvements such as SwiGLU and RoPE. These modifications appear promising in the paper and are also prevalent in modern transformers, such as LLaMA. Despite EVA02_CLIP_E being referred to as EVA02, it lacks these components, as do EVA-CLIP-8B and EVA-CLIP-18B. Is there any specific reason why you chose not to use them?

@Quan-Sun
Copy link
Contributor

@stevenliu000 we have continued to follow the model arch and approach used in EVA-01 when scaling up a smaller model to a larger one, such as EVA-02-E and EVA-8B/18B.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants