💯AI00 RWKV Server

AI00 RWKV Server is an inference API server for the RWKV language model based upon the web-rwkv inference engine.

It supports VULKAN parallel and concurrent batched inference and can run on all GPUs that support VULKAN. No need for Nvidia cards!!! AMD cards and even integrated graphics can be accelerated!!!

No need for bulky pytorch, CUDA and other runtime environments, it's compact and ready to use out of the box!

Compatible with OpenAI's ChatGPT API interface.

100% open source and commercially usable, under the MIT license.

If you are looking for a fast, efficient, and easy-to-use LLM API server, then AI00 RWKV Server is your best choice. It can be used for various tasks, including chatbots, text generation, translation, and Q&A.

Join the AI00 RWKV Server community now and experience the charm of AI!

QQ Group for communication: 30920262

💥Features

Based on the RWKV model, it has high performance and accuracy
Supports VULKAN inference acceleration, you can enjoy GPU acceleration without the need for CUDA! Supports AMD cards, integrated graphics, and all GPUs that support VULKAN
No need for bulky pytorch, CUDA and other runtime environments, it's compact and ready to use out of the box!
Compatible with OpenAI's ChatGPT API interface

⭕Usages

Chatbots
Text generation
Translation
Q&A
Any other tasks that LLM can do

👻Other

Based on the web-rwkv project
Model download: V5 or V6

Installation, Compilation, and Usage

📦Download Pre-built Executables

Directly download the latest version from Release
After downloading the model, place the model in the assets/models/ path, for example, assets/models/RWKV-x060-World-3B-v2-20240228-ctx4096.st
Optionally modify assets/Config.toml for model configurations like model path, quantization layers, etc.
Run in the command line
```
$ ./ai00_rwkv_server
```
Open the browser and visit the WebUI https://localhost:65530

📜(Optional) Build from Source

Install Rust

Clone this repository

$ git clone https://github.com/cgisky1980/ai00_rwkv_server.git
$ cd ai00_rwkv_server

After downloading the model, place the model in the assets/models/ path, for example, assets/models/RWKV-x060-World-3B-v2-20240228-ctx4096.st
Compile
```
$ cargo build --release
```
After compilation, run
```
$ cargo run --release
```
Open the browser and visit the WebUI https://localhost:65530

📒Convert the Model

It only supports Safetensors models with the .st extension now. Models saved with the .pth extension using torch need to be converted before use.

Download the .pth model
In the Release you could find an executable called converter. Run

$ ./converter --input /path/to/model.pth

If you are building from source, run

$ cargo run --release --bin converter -- --input /path/to/model.pth

Just like the steps mentioned above, place the model in the .st model in the assets/models/ path and modify the model path in assets/Config.toml

📝Supported Arguments

--config: Configure file path (default: assets/Config.toml)
--ip: The IP address the server is bound to
--port: Running port

📙Currently Available APIs

The API service starts at port 65530, and the data input and output format follow the Openai API specification.

/api/oai/v1/models
/api/oai/models
/api/oai/v1/chat/completions
/api/oai/chat/completions
/api/oai/v1/completions
/api/oai/completions
/api/oai/v1/embeddings
/api/oai/embeddings

The following is an example of ai00 invocation based on Python and an out of the box tool class implementation

import openai

class Ai00:
    def __init__(self,model="model",port=65530,api_key="JUSTSECRET_KEY") :
        openai.api_base = f"http://127.0.0.1:{port}/api/oai"
        openai.api_key = api_key
        self.ctx = []
        self.params = {
            "system_name": "System",
            "user_name": "User", 
            "assistant_name": "Assistant",
            "model": model,
            "max_tokens": 4096,
            "top_p": 0.6,
            "temperature": 1,
            "presence_penalty": 0.3,
            "frequency_penalty": 0.3,
            "half_life": 400,
            "stop": ['\x00','\n\n']
        }
        
    def set_params(self,**kwargs):
        self.params.update(kwargs)
        
    def clear_ctx(self):
        self.ctx = []
        
    def get_ctx(self):
        return self.ctx
    
    def continuation(self, message):
        response = openai.Completion.create(
            model=self.params['model'],
            prompt=message,
            max_tokens=self.params['max_tokens'],
            half_life=self.params['half_life'],
            top_p=self.params['top_p'],
            temperature=self.params['temperature'],
            presence_penalty=self.params['presence_penalty'],
            frequency_penalty=self.params['frequency_penalty'],
            stop=self.params['stop']
        )
        result = response.choices[0].text
        return result
    
    def append_ctx(self,role,content):
        self.ctx.append({
            "role": role,
            "content": content
        })
        
    def send_message(self, message,role="user"):
        self.ctx.append({
            "role": role,
            "content": message
        })
        result = openai.ChatCompletion.create(
            model=self.params['model'],
            messages=self.ctx,
            names={
                "system": self.params['system_name'],
                "user": self.params['user_name'],
                "assistant": self.params['assistant_name']
            },
            max_tokens=self.params['max_tokens'],
            half_life=self.params['half_life'],
            top_p=self.params['top_p'],
            temperature=self.params['temperature'],
            presence_penalty=self.params['presence_penalty'],
            frequency_penalty=self.params['frequency_penalty'],
            stop=self.params['stop']
        )
        result = result.choices[0].message['content']
        self.ctx.append({
            "role": "assistant",
            "content": result
        })
        return result
    
ai00 = Ai00()
ai00.set_params(
    max_tokens = 4096,
    top_p = 0.55,
    temperature = 2,
    presence_penalty = 0.3,
    frequency_penalty = 0.8,
    half_life = 400,
    stop = ['\x00','\n\n']
)
print(ai00.send_message("how are you?"))
print(ai00.send_message("me too!"))
print(ai00.get_ctx())
ai00.clear_ctx()
print(ai00.continuation("i like"))

📙WebUI Screenshots

Chat Feature

Continuation Feature

Paper Writing Feature

📝TODO List

Support for text_completions and chat_completions
Support for sse push
Add embeddings
Integrate basic front-end
Parallel inference via batch serve
Support for int8 quantization
Support for NF4 quantization
Support for LoRA model
Hot loading and switching of LoRA model

👥Join Us

We are always looking for people interested in helping us improve the project. If you are interested in any of the following, please join us!

💀Writing code
💬Providing feedback
🔆Proposing ideas or needs
🔍Testing new features
✏Translating documentation
📣Promoting the project
🏅Anything else that would be helpful to us

No matter your skill level, we welcome you to join us. You can join us in the following ways:

Join our Discord channel
Join our QQ group
Submit issues or pull requests on GitHub
Leave feedback on our website

We can't wait to work with you to make this project better! We hope the project is helpful to you!

Thank you to these awesome individuals who are insightful and outstanding for their support and selfless dedication to the project

_顾真牛
📖 💻 🖋 🎨 🧑‍🏫

_研究社交
💻 💡 🤔 🚧 👀 📦

_josc146
🐛 💻 🤔 🔧

_l15y
🔧 🔌 💻

_{Cahya Wirawan}
🐛

_{yuunnn_w}
📖

⚠️

_longzou
💻 🛡️

Name		Name	Last commit message	Last commit date
Latest commit History 615 Commits
.github/workflows		.github/workflows
assets		assets
docs		docs
img		img
src		src
.all-contributorsrc		.all-contributorsrc
.gitattributes		.gitattributes
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
codebase.md		codebase.md
convert_safetensors.py		convert_safetensors.py
flowchart.png		flowchart.png
issues.md		issues.md
package-lock.json		package-lock.json
package.json		package.json

License

Ai00-X/ai00_server

Folders and files

Latest commit

History

Repository files navigation

💯AI00 RWKV Server

💥Features

⭕Usages

👻Other

Installation, Compilation, and Usage

📦Download Pre-built Executables

📜(Optional) Build from Source

📒Convert the Model

📝Supported Arguments

📙Currently Available APIs

📙WebUI Screenshots

Chat Feature

Continuation Feature

Paper Writing Feature

📝TODO List

👥Join Us

Thank you to these awesome individuals who are insightful and outstanding for their support and selfless dedication to the project

Stargazers over time

About

Topics

Resources

License

Stars

Watchers

Forks

Languages