Large Language Model Accelerator

LLMA is an end-to-end optimizing framework for large language models.

The goal of LLMA is to accelerate large language models inference process both on cloud and embedded environment.

With LLMA framework, different large language models can be deployed to different platforms with high performance in a flexible and easy way.

Support features

1.1 Support large language model inference

For the large language model such as LLaMA-7B, LLMA can deploy it on different hardwares like NVIDIA GPU and Cloudblazer Yunsui t20.

LLMA supports doing inference with client requests. Specifically, the client sends an inference request and LLMA returns the inference result to the client.

1.2 Support large language model optimization

LLMA supports several optimizing technologies like model fine-tuning and model quantization.

Getting Started

This example demonstrates how to use LLAM to deploy LLaMA-7B on Cloudblazer Yunsui t20.

Tutorial

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
example		example
llama_inference_service		llama_inference_service
.flake8		.flake8
.gitignore		.gitignore
.markdownlint.json		.markdownlint.json
.pylintrc		.pylintrc
LICENSE		LICENSE
README.md		README.md
azure-pipelines.yml		azure-pipelines.yml
bandit.yaml		bandit.yaml
bors.toml		bors.toml
setup.py		setup.py
tox.ini		tox.ini

License

Adlik/llma

Folders and files

Latest commit

History

Repository files navigation

Large Language Model Accelerator

Support features

1.1 Support large language model inference

1.2 Support large language model optimization

Getting Started

License

About

Resources

License

Stars

Watchers

Forks

Languages