-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can give a example on huggingface model such as Phi-2,Yi etc. #21
Comments
mark |
Thank you for the nice suggestion. We are going to perform some evaluation about the speed and throughput of Inferflow in serving models of different size on different devices, and compare with some other inference engines. About the examples of serving models like Phi-2 and Yi using Inferflow: We have predefined model specification files for Phi-2, Yi-6B-200K, Yi-34B-Chat, and a list of other models. Below are the steps of serving Phi-2 using Inferflow:
Steps 3 and 4 are for testing or validation purpose only. You can skip these two steps in real serving. The steps of serving other models (including the Yi models which you have mentioned) is similar. We are going to add more description in README.md to facilitate users. |
Earlier versions of the phi-2 model may not be applicable, because the parameter |
@MonadKai Thank you for pointing out this. We will support reading the value of n_embd since it seems to be a reasonble abbreviation. |
Good job!
Hope to see comparisons with different frameworks on some models, such as throughputs, first token speed, etc.
The text was updated successfully, but these errors were encountered: