Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenELM Support #1684

Open
jncraton opened this issue Apr 26, 2024 · 0 comments
Open

OpenELM Support #1684

jncraton opened this issue Apr 26, 2024 · 0 comments

Comments

@jncraton
Copy link
Contributor

A family of LLMs called OpenELM have recently been released. They range in size from 270M to 3B parameters:

Model Size ARC-c ARC-e BoolQ HellaSwag PIQA SciQ WinoGrande Average
OpenELM-270M 26.45 45.08 53.98 46.71 69.75 84.70 53.91 54.37
OpenELM-270M-Instruct 30.55 46.68 48.56 52.07 70.78 84.40 52.72 55.11
OpenELM-450M 27.56 48.06 55.78 53.97 72.31 87.20 58.01 57.56
OpenELM-450M-Instruct 30.38 50.00 60.37 59.34 72.63 88.00 58.96 59.95
OpenELM-1_1B 32.34 55.43 63.58 64.81 75.57 90.60 61.72 63.44
OpenELM-1_1B-Instruct 37.97 52.23 70.00 71.20 75.03 89.30 62.75 65.50
OpenELM-3B 35.58 59.89 67.40 72.44 78.24 92.70 65.51 67.39
OpenELM-3B-Instruct 39.42 61.74 68.17 76.36 79.00 92.50 66.85 69.15

These models appear to outperform models of similar scale on various benchmarks:

Benchmarks

They could have application in areas where compute is limited or efficiency is a priority. The architecture uses standard transformer components for the most part, but it does include layer-wise scaling. From the paper:

Layer-wise scaling. A standard transformer layer is composed of multi-head attention (MHA) and feed-forward network (FFN). For non-uniform allocation of parameters in the transformer layer, we adjust the number of attention heads and the FFN multiplier in each transformer layer.

It would be helpful to add support for this architecture in CTranslate2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant