You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The EVA02 paper introduces several architectural improvements such as SwiGLU and RoPE. These modifications appear promising in the paper and are also prevalent in modern transformers, such as LLaMA. Despite EVA02_CLIP_E being referred to as EVA02, it lacks these components, as do EVA-CLIP-8B and EVA-CLIP-18B. Is there any specific reason why you chose not to use them?
The text was updated successfully, but these errors were encountered:
@stevenliu000 we have continued to follow the model arch and approach used in EVA-01 when scaling up a smaller model to a larger one, such as EVA-02-E and EVA-8B/18B.
The EVA02 paper introduces several architectural improvements such as SwiGLU and RoPE. These modifications appear promising in the paper and are also prevalent in modern transformers, such as LLaMA. Despite EVA02_CLIP_E being referred to as EVA02, it lacks these components, as do EVA-CLIP-8B and EVA-CLIP-18B. Is there any specific reason why you chose not to use them?
The text was updated successfully, but these errors were encountered: