[Doc] Add `projects` section in README which is developed based on FasterTransformer #731

lvhan028 · 2023-07-25T04:35:34Z

It is noted that some issues(#506 #729 #727) are requesting FasterTransformer to support Llama and Llama-2. Our project LMDeploy developed based on FasterTransformer, has supported them and their derived models, like vicuna, alpaca, baichuan, and so on.

Meanwhile, LMDeploy has developed a continuous-batch-like feature named persistent-batch, which can handle #696 by the way. It modeled the inference of a conversational LLM as a persistently running batch whose lifetime spans the entire serving process, To put it simply

The persistent batch as N pre-configured batch slots.
Requests join the batch when there are free slots available. A batch slot is released and can be reused once the generation of the requested tokens is finished.
On cache-hits , history tokens don't need to be decoded in every round of a conversation; generation of response tokens will start instantly.
The batch grows or shrinks automatically to minimize unnecessary computations.

We really appreciate FasterTransformer team for developing such an efficient and high-throughput LLM inference engine

AnyangAngus · 2023-07-25T06:39:37Z

@lvhan028
Cool！
I see TurboMind can support llama-2-70b with GQA now.
I would like to ask if there will be any support plans for LMDeploy to support Llama-2-7b and Llama-2-13b with GQA ?
Thank U！

lvhan028 · 2023-07-25T07:48:24Z

@AnyangAngus
GQA in LMDeploy/TurboMind doesn't distinguish between 7B, 13B, or 70B models.

But as far as I know, llama-2-7b/13b doesn't have GQA block

add projects

08fcd41

lvhan028 changed the title ~~[Doc] add projects section in README which is developed based on FasterTransformer~~ [Doc] Add projects section in README which is developed based on FasterTransformer Jul 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doc] Add `projects` section in README which is developed based on FasterTransformer #731

[Doc] Add `projects` section in README which is developed based on FasterTransformer #731

lvhan028 commented Jul 25, 2023 •

edited

AnyangAngus commented Jul 25, 2023

lvhan028 commented Jul 25, 2023

[Doc] Add projects section in README which is developed based on FasterTransformer #731

Are you sure you want to change the base?

[Doc] Add projects section in README which is developed based on FasterTransformer #731

Conversation

lvhan028 commented Jul 25, 2023 • edited

AnyangAngus commented Jul 25, 2023

lvhan028 commented Jul 25, 2023

[Doc] Add `projects` section in README which is developed based on FasterTransformer #731

[Doc] Add `projects` section in README which is developed based on FasterTransformer #731

lvhan028 commented Jul 25, 2023 •

edited