Skip to content

Latest commit

 

History

History
18 lines (14 loc) · 748 Bytes

serve_deepspeed.md

File metadata and controls

18 lines (14 loc) · 748 Bytes

Deploying and Serving LLMs with Deepspeed

This guide provides steps for deploying and serving LLMs with Deepspeed, to legerage features such as automatic tensor parallelism (AutoTP).

Setup

Please follow setup.md to setup the environment first. Additional, you will need to install deepspeed dependencies as below.

pip install .[deepspeed]

Configure Serving Parameters

Please follow the serving document for configuring the parameters. In the configuration file, you need to set deepspeed to true to enable Deepspeed AutoTP feature.

deepspeed: true

Deploy and Test

Please follow the serving document for deploying and testing.