Skip to content

Latest commit

 

History

History
18 lines (15 loc) · 839 Bytes

README.md

File metadata and controls

18 lines (15 loc) · 839 Bytes

Examples

xFasterTransformer provides C++, Python(Pytorch) examples to help users learn the API usage. Web demos of some models based on Gradio are provided. All of the examples and web demo support multi-rank.

C++ example support automatic identification model and tokenizer which is implemented by SentencePiece, excluding Opt model which tokenizer is a hard code.

Python(PyTorch) example achieves end-to-end inference of the model with streaming output combining the transformer's tokenizer.

A web demo based on Gradio is provided in repo.
Support list:

  • ChatGLM
  • ChatGLM2
  • ChatGLM3
  • Llama2-chat
  • Baichuan2
  • Qwen