Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can give a example on huggingface model such as Phi-2,Yi etc. #21

Open
Arcmoon-Hu opened this issue Jan 18, 2024 · 4 comments
Open

Can give a example on huggingface model such as Phi-2,Yi etc. #21

Arcmoon-Hu opened this issue Jan 18, 2024 · 4 comments

Comments

@Arcmoon-Hu
Copy link

Good job!

Hope to see comparisons with different frameworks on some models, such as throughputs, first token speed, etc.

@litetoooooom
Copy link

mark

@shumingshi
Copy link
Collaborator

Thank you for the nice suggestion. We are going to perform some evaluation about the speed and throughput of Inferflow in serving models of different size on different devices, and compare with some other inference engines.

About the examples of serving models like Phi-2 and Yi using Inferflow: We have predefined model specification files for Phi-2, Yi-6B-200K, Yi-34B-Chat, and a list of other models.

Below are the steps of serving Phi-2 using Inferflow:

  • Step-1. Download the model files from Huggingface, and place them to data/models/phi_2/
    (model data downloading information can be found in data/models/phi_2/download.sh)
  • Step-2. Open bin/inferflow_service.ini to make sure that phi-2 is selected as the current model. In other words, the following line should be uncommented:
            models = phi_2
  • Step-3. Set a prompt (or a query) by editing this file: /bin/llm_inference.ini
  • Step-4. Run the llm_inference tool to get inference results: cd bin/release; ./llm_inference
  • Step-5. Serve the model (via HTTP protocol): cd bin/release; ./inferflow_service

Steps 3 and 4 are for testing or validation purpose only. You can skip these two steps in real serving.

The steps of serving other models (including the Yi models which you have mentioned) is similar.

We are going to add more description in README.md to facilitate users.

@MonadKai
Copy link

Earlier versions of the phi-2 model may not be applicable, because the parameter n_embed in config.json is misspelled as n_embd.

@shumingshi
Copy link
Collaborator

@MonadKai Thank you for pointing out this. We will support reading the value of n_embd since it seems to be a reasonble abbreviation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants