Skip to content

Latest commit

 

History

History

NeuralChat

A customizable framework to create your own LLM-driven AI apps within minutes

🌟RESTful API   |   🔥Features   |   💻Examples   |   📖Notebooks

Introduction

NeuralChat is a powerful and flexible open framework that empowers you to effortlessly create LLM-centric AI applications, including chatbots and copilots.

NeuralChat

NeuralChat is under active development. APIs are subject to change.

System Requirements

Please make sure below basic system libraries are installed. If you want to try more features, please refer to system requirements

apt-get update
apt-get install -y python3-pip
apt-get install -y libgl1-mesa-glx

Note: If your system only have python3 or you meet error python: command not found, please run ln -sf $(which python3) /usr/bin/python.

Installation

NeuralChat is under Intel Extension for Transformers, so ensure the installation of Intel Extension for Transformers first by following the installation. After that, install additional dependency for NeuralChat per your device:

pip install intel-extension-for-transformers
pip install fastapi==0.103.2

# For CPU device
pip install -r requirements_cpu.txt

# For HPU device
pip install -r requirements_hpu.txt

# For XPU device
pip install -r requirements_xpu.txt

# For Windows
pip install -r requirements_win.txt

# For CUDA device
pip install -r requirements.txt

Note: Suggest using fastapi==0.103.2

Getting Started

OpenAI-Compatible RESTful APIs

NeuralChat provides OpenAI-compatible RESTful APIs for LLM inference, so you can use NeuralChat as a drop-in replacement for OpenAI APIs. NeuralChat service can also be accessible through OpenAI client library, curl commands, and requests library. See neuralchat_api.md.

Launch OpenAI-compatible Service

NeuralChat launches a chatbot service using Intel/neural-chat-7b-v3-1 by default. You can customize the chatbot service by configuring the YAML file.

You can start the NeuralChat server either using the shell command or Python code.

Using Shell Command:

neuralchat_server start --config_file ./server/config/neuralchat.yaml

Using Python Code:

from intel_extension_for_transformers.neural_chat import NeuralChatServerExecutor
server_executor = NeuralChatServerExecutor()
server_executor(config_file="./server/config/neuralchat.yaml", log_file="./neuralchat.log")

Access the Service

Once the service is running, you can observe an OpenAI-compatible endpoint /v1/chat/completions. You can use any of below ways to access the endpoint.

Using OpenAI Client Library

First, install openai-python:

pip install --upgrade openai

Then, interact with the model:

import openai
openai.api_key = "EMPTY"
openai.base_url = 'http://127.0.0.1:8000/v1/'
response = openai.chat.completions.create(
      model="Intel/neural-chat-7b-v3-1",
      messages=[
          {"role": "system", "content": "You are a helpful assistant."},
          {"role": "user", "content": "Tell me about Intel Xeon Scalable Processors."},
      ]
)
print(response.choices[0].message.content)

Note: When intel-extension-for-transformers <= 1.3.1, please try command below

Using Curl

curl http://127.0.0.1:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "model": "Intel/neural-chat-7b-v3-1",
    "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Tell me about Intel Xeon Scalable Processors."}
    ]
    }'

Note: When intel-extension-for-transformers <= 1.3.1, please use old command like:

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "Tell me about Intel Xeon Scalable Processors."}' http://127.0.0.1:8000/v1/chat/completions

Using Python Requests Library

import requests
url = 'http://127.0.0.1:8000/v1/chat/completions'
headers = {'Content-Type': 'application/json'}
data = '{"model": "Intel/neural-chat-7b-v3-1", "messages": [ \
          {"role": "system", "content": "You are a helpful assistant."}, \
          {"role": "user", "content": "Tell me about Intel Xeon Scalable Processors."}] \
       }'
response = requests.post(url, headers=headers, data=data)
print(response.json())

Note: When intel-extension-for-transformers <= 1.3.1, please try command above

Deploy NeuralChat Service

Please refer to the deployment examples for deploying the NeuralChat service.

Application LLM HW Description Examples
TextGen NeuralChat-7B Xeon Text Generation Application Text Generation Example
ChatQnA NeuralChat-7B Xeon, Gaudi RAG Application RAG Example
CodeGen Phind/Phind-CodeLlama-34B-v2 AIPC, Xeon, Gaudi Code Generation Application CodeGen Example
TalkingBot NeuralChat-7B, speecht5_ttswhisper AIPC, Xeon Audio & LLM Generation Application TalkingBot Example

Langchain Extension APIs

Intel Extension for Transformers provides a comprehensive suite of Langchain-based extension APIs, including advanced retrievers, embedding models, and vector stores. These enhancements are carefully crafted to expand the capabilities of the original langchain API, ultimately boosting overall performance. This extension is specifically tailored to enhance the functionality and performance of RAG.

Vector Stores

We introduce enhanced vector store operations, enabling users to adjust and fine-tune their settings even after the chatbot has been initialized, offering a more adaptable and user-friendly experience. For langchain users, integrating and utilizing optimized Vector Stores is straightforward by replacing the original Chroma API in langchain.

from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline
from langchain.chains import RetrievalQA
from langchain_core.vectorstores import VectorStoreRetriever
from intel_extension_for_transformers.langchain_community.vectorstores import Chroma
retriever = VectorStoreRetriever(vectorstore=Chroma(...))
retrievalQA = RetrievalQA.from_llm(llm=HuggingFacePipeline(...), retriever=retriever)

Retrievers

We provide optimized retrievers such as VectorStoreRetriever, ChildParentRetriever to efficiently handle vectorstore operations, ensuring optimal retrieval performance.

from intel_extension_for_transformers.langchain_community.retrievers import ChildParentRetriever
from langchain.vectorstores import Chroma
retriever = ChildParentRetriever(vectorstore=Chroma(documents=child_documents), parentstore=Chroma(documents=parent_documents), search_type=xxx, search_kwargs={...})
docs=retriever.get_relevant_documents("Intel")

Please refer to this documentation for more details.

Customizing the NeuralChat Service

Users have the flexibility to customize the NeuralChat service by making modifications in the YAML configuration file. Detailed instructions can be found in the documentation.

Supported Models

NeuralChat boasts support for various generative Transformer models available in HuggingFace Transformers. The following is a curated list of models validated for both inference and fine-tuning within NeuralChat:

Pretrained model Text Generation (Completions) Text Generation (Chat Completions) Summarization Code Generation or SQL Generation
Intel/neural-chat-7b-v1-1
Intel/neural-chat-7b-v3-1
meta-llama/Llama-2-7b-chat-hf
meta-llama/Llama-2-70b-chat-hf
EleutherAI/gpt-j-6b
mosaicml/mpt-7b-chat
mistralai/Mistral-7B-v0.1
mistralai/Mixtral-8x7B-Instruct-v0.1
upstage/SOLAR-10.7B-Instruct-v1.0
THUDM/chatglm2-6b
THUDM/chatglm3-6b
Qwen/Qwen-7B
microsoft/phi-2
Deci/DeciLM-7B
Deci/DeciLM-7B-instruct
bigcode/starcoder
codellama/CodeLlama-7b-hf
codellama/CodeLlama-34b-hf
Phind/Phind-CodeLlama-34B-v2
Salesforce/codegen2-7B
ise-uiuc/Magicoder-S-CL-7B
defog/sqlcoder2
defog/sqlcoder-34b-alpha

Modify the model_name_or_path parameter in the YAML configuration file to load different models.

Rich Plugins

NeuralChat includes support for various plugins to enhance its capabilities:

Please be aware that additional libraries are required for various plugins. You can locate a 'requirements.txt' file in each plugin directory. Navigate to the plugin directory and execute 'pip install -r requirements.txt'. For instance, to enable the RAG plugin, run the following commands:

cd ./pipeline/plugins/retrieval/
pip install -r requirements.txt

Multimodal APIs

In addition to the text-based chat RESTful API, NeuralChat offers several helpful plugins in its RESTful API lineup to aid users in building multimodal applications. NeuralChat supports the following RESTful APIs:

Tasks List RESTful APIs
textchat /v1/chat/completions
/v1/completions
voicechat /v1/audio/speech
/v1/audio/transcriptions
/v1/audio/translations
retrieval /v1/rag/create
/v1/rag/append
/v1/rag/upload_link
/v1/rag/chat
codegen /v1/code_generation
/v1/code_chat
text2image /v1/text2image
image2image /v1/image2image
faceanimation /v1/face_animation
finetune /v1/finetune

Modify the tasks_list parameter in the YAML configuration file to seamlessly leverage different RESTful APIs as per your project needs.