Text Generation Server

Serving Large Language Models as an API Endpoint for Inference

Introduction

This Project Aims to Effortlessly Deploy and Serve Large Language Models in the Cloud as an API Endpoint and a Simple chat interface for inference. This Repository provides streamlined Server Side Code to deploy your LLMs in a cloud environment, Enabling Seamless Inference through chat interface on your local system.

Getting Started

If You Want to Start Deploying and Host your Model in Cloud. Follow the Steps below to get started.

Installation

First of all, Clone this Repository and Get in into this directory

git clone https://github.com/TheFaheem/TGS.git && cd TGS

Requirements

Make Sure You Install all the Required Libraries by Running,

pip install -r requirements.txt

Now You are Good to go ...

Usage

After Setting this repository in your Cloud Machine. You can start deploying your model by following the steps below.

Serving:

You Can Start the Deploying your Model as an API Endpoint in the Cloud by Running the Following Command with the AppropriateArguments Below.

python serve.py --model_type ${MODEL TYPE} --repo_id ${REPO ID} --revision ${REVISION} --model_basename ${MODEL BASENAME} --trust_remote_code ${TRUST REMOTE CODE} --safetensors ${SAFETENSOR}

Arguments Detail:

MODEL TYPE - Type of the Model. eg., llama, mpt, falcon, rwkv \n
REPO ID - Repo id of the Huggingface Model
REVISION - Specific Branch to download the model repo from
MODEL BASENAME - Name of the Safetensor File, use all of that name except '.safetensors'
TRUST REMOTE CODE - Whether or not to use remote code
SAFETENSOR - Whether or not to use Safetensor

Inferencing:

You can Start the Chat Interface backed by you're Model from your local system by running the following command in your terminal. The inference.py File will Take care of All The API Calls Behind

python inference.py --endpoint ${ENDPOINT} --streaming ${STREAMING} --max_tokens ${MAX TOKENS} --ht_ws ${HTWS} --temperature ${TEMPERATURE} --top_p ${TOP_P} --top_k ${TOP_K}

Arguments Detail:

ENDPOINT - Url Which will be Given From the Cloud after seconds when you start deploying your model.
STREAMING - Whether to Stream the Result or Not
MAX TOKENS - Maximum Tokens to Genrate
HTWS - Whether to use http ("http") or websockets ("ws")
TEMPERATURE - Temperature for Sampling. temperature 0.0 will produce concise response whereas temperature close 1.0 will increase randomness in output
TOP_P - Top Probalities for Logits to Sample From.
TOP_K - Top K Logits used for Sampling

Contributing

If You Have Any Ideas, or found a bug or if you want to improve this further more. I Encourage you Contribute by creating fork of this repo and If You are done with your work, just create a pull request, I'll Check that and pull that in as soon as i can.

License

This project is licensed under the terms of the MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

inference.py

inference.py

requirements.txt

requirements.txt

serve.py

serve.py

Repository files navigation

Text Generation Server

Serving Large Language Models as an API Endpoint for Inference

Table of Contents

Introduction

Getting Started

Installation

Requirements

Usage

Serving:

Arguments Detail:

Inferencing:

Arguments Detail:

Contributing

License

If You Find This Repo Useful, Just a Reminder, There's a Star button up there. Hope this'll be useful for you :)

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
requirements.txt		requirements.txt
serve.py		serve.py

License

crazycoderF12/TGS

Folders and files

Latest commit

History

Repository files navigation

Text Generation Server

Serving Large Language Models as an API Endpoint for Inference

Table of Contents

Introduction

Getting Started

Installation

Requirements

Usage

Serving:

Arguments Detail:

Inferencing:

Arguments Detail:

Contributing

License

If You Find This Repo Useful, Just a Reminder, There's a Star button up there. Hope this'll be useful for you :)

About

Topics

Resources

License

Stars

Watchers

Forks

Languages