You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
These models appear to outperform models of similar scale on various benchmarks:
They could have application in areas where compute is limited or efficiency is a priority. The architecture uses standard transformer components for the most part, but it does include layer-wise scaling. From the paper:
Layer-wise scaling. A standard transformer layer is composed of multi-head attention (MHA) and feed-forward network (FFN). For non-uniform allocation of parameters in the transformer layer, we adjust the number of attention heads and the FFN multiplier in each transformer layer.
It would be helpful to add support for this architecture in CTranslate2.
The text was updated successfully, but these errors were encountered:
A family of LLMs called OpenELM have recently been released. They range in size from 270M to 3B parameters:
These models appear to outperform models of similar scale on various benchmarks:
They could have application in areas where compute is limited or efficiency is a priority. The architecture uses standard transformer components for the most part, but it does include layer-wise scaling. From the paper:
It would be helpful to add support for this architecture in CTranslate2.
The text was updated successfully, but these errors were encountered: