GitHub - hysunflower/Serving: A flexible, high-performance carrier for machine learning models（『飞桨』服务化部署框架）

Motivation

We consider deploying deep learning inference service online to be a user-facing application in the future. The goal of this project: When you have trained a deep neural net with Paddle, you are also capable to deploy the model online easily. A demo of Paddle Serving is as follows:

Installation

We highly recommend you to run Paddle Serving in Docker, please visit Run in Docker. See the document for more docker images.

# Run CPU Docker
docker pull hub.baidubce.com/paddlepaddle/serving:latest
docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest
docker exec -it test bash

# Run GPU Docker
nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
nvidia-docker exec -it test bash

pip install paddle-serving-client==0.4.0 
pip install paddle-serving-server==0.4.0 # CPU
pip install paddle-serving-app==0.2.0
pip install paddle-serving-server-gpu==0.4.0.post9 # GPU with CUDA9.0
pip install paddle-serving-server-gpu==0.4.0.post10 # GPU with CUDA10.0
pip install paddle-serving-server-gpu==0.4.0.100 # GPU with CUDA10.1+TensorRT

You may need to use a domestic mirror source (in China, you can use the Tsinghua mirror source, add -i https://pypi.tuna.tsinghua.edu.cn/simple to pip command) to speed up the download.

If you need install modules compiled with develop branch, please download packages from latest packages list and install with pip install command.

Packages of paddle-serving-server and paddle-serving-server-gpu support Centos 6/7, Ubuntu 16/18, Windows 10.

Packages of paddle-serving-client and paddle-serving-app support Linux and Windows, but paddle-serving-client only support python2.7/3.5/3.6/3.7.

Recommended to install paddle >= 1.8.4.

For Windows Users, please read the document Paddle Serving for Windows Users

Pre-built services with Paddle Serving

Chinese Word Segmentation

> python -m paddle_serving_app.package --get_model lac
> tar -xzf lac.tar.gz
> python lac_web_service.py lac_model/ lac_workdir 9393 &
> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "我爱北京天安门"}], "fetch":["word_seg"]}' http://127.0.0.1:9393/lac/prediction
{"result":[{"word_seg":"我|爱|北京|天安门"}]}

Image Classification

> python -m paddle_serving_app.package --get_model resnet_v2_50_imagenet
> tar -xzf resnet_v2_50_imagenet.tar.gz
> python resnet50_imagenet_classify.py resnet50_serving_model &
> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"image": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"}], "fetch": ["score"]}' http://127.0.0.1:9292/image/prediction
{"result":{"label":["daisy"],"prob":[0.9341403245925903]}}

Quick Start Example

This quick start example is only for users who already have a model to deploy and we prepare a ready-to-deploy model here. If you want to know how to use paddle serving from offline training to online serving, please reference to Train_To_Service

Boston House Price Prediction model

wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
tar -xzf uci_housing.tar.gz

Paddle Serving provides HTTP and RPC based service for users to access

RPC service

A user can also start a RPC service with paddle_serving_server.serve. RPC service is usually faster than HTTP service, although a user needs to do some coding based on Paddle Serving's python client API. Note that we do not specify --name here.

python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292

Argument	Type	Default	Description
`thread`	int	`4`	Concurrency of current service
`port`	int	`9292`	Exposed port of current service to users
`model`	str	`""`	Path of paddle model directory to be served
`mem_optim_off`	-	-	Disable memory / graphic memory optimization
`ir_optim`	-	-	Enable analysis and optimization of calculation graph
`use_mkl` (Only for cpu version)	-	-	Run inference with MKL
`use_trt` (Only for trt version)	-	-	Run inference with TensorRT

# A user can visit rpc service through paddle_serving_client API
from paddle_serving_client import Client
import numpy as np
client = Client()
client.load_client_config("uci_housing_client/serving_client_conf.prototxt")
client.connect(["127.0.0.1:9292"])
data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
        -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]
fetch_map = client.predict(feed={"x": np.array(data).reshape(1,13,1)}, fetch=["price"])
print(fetch_map)

Here, client.predict function has two arguments. feed is a python dict with model input variable alias name and values. fetch assigns the prediction variables to be returned from servers. In the example, the name of "x" and "price" are assigned when the servable model is saved during training.

WEB service

Users can also put the data format processing logic on the server side, so that they can directly use curl to access the service, refer to the following case whose path is python/examples/fit_a_line

from paddle_serving_server.web_service import WebService
import numpy as np

class UciService(WebService):
    def preprocess(self, feed=[], fetch=[]):
        feed_batch = []
        is_batch = True
        new_data = np.zeros((len(feed), 1, 13)).astype("float32")
        for i, ins in enumerate(feed):
            nums = np.array(ins["x"]).reshape(1, 1, 13)
            new_data[i] = nums
        feed = {"x": new_data}
        return feed, fetch, is_batch

uci_service = UciService(name="uci")
uci_service.load_model_config("uci_housing_model")
uci_service.prepare_server(workdir="workdir", port=9292)
uci_service.run_rpc_service()
uci_service.run_web_service()

for client side,

curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction

the response is

{"result":{"price":[[18.901151657104492]]}}

Some Key Features of Paddle Serving

Integrate with Paddle training pipeline seamlessly, most paddle models can be deployed with one line command.
Industrial serving features supported, such as models management, online loading, online A/B testing etc.
Distributed Key-Value indexing supported which is especially useful for large scale sparse features as model inputs.
Highly concurrent and efficient communication between clients and servers supported.
Multiple programming languages supported on client side, such as Golang, C++ and python.

Document

New to Paddle Serving

Tutorial at AIStudio

Developers

About Efficiency

FAQ

FAQ(Chinese)

Design

Design Doc

Community

Slack

To connect with other users and contributors, welcome to join our Slack channel

Contribution

If you want to contribute code to Paddle Serving, please reference Contribution Guidelines

Special Thanks to @BeyondYourself in complementing the gRPC tutorial, updating the FAQ doc and modifying the mdkir command
Special Thanks to @mcl-stone in updating faster_rcnn benchmark
Special Thanks to @cg82616424 in updating the unet benchmark and modifying resize comment error

Feedback

For any feedback or to report a bug, please propose a GitHub Issue.

License

Apache 2.0 License

Name		Name	Last commit message	Last commit date
Latest commit History 6,607 Commits
.github/workflows		.github/workflows
cmake		cmake
core		core
doc		doc
go		go
java		java
paddle_inference		paddle_inference
python		python
tools		tools
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
.style.yapf		.style.yapf
.travis.yml		.travis.yml
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md

License

hysunflower/Serving

Folders and files

Latest commit

History

Repository files navigation

Motivation

Installation

Pre-built services with Paddle Serving

Chinese Word Segmentation

Image Classification

Quick Start Example

Boston House Price Prediction model

RPC service

WEB service

Some Key Features of Paddle Serving

Document

New to Paddle Serving

Tutorial at AIStudio

Developers

About Efficiency

FAQ

Design

Community

Slack

Contribution

Feedback

License

About

Resources

License

Stars

Watchers

Forks

Languages