tensorrt-triton-magface

High performance and advanced inference for face recognition with Triton Inference Server.
This repo shows how to deploy Magface with TensorRT engine to Triton Inference Server.

Speed test result

Loss-Backbone	Pytorch(ms)	TensorRT_FP16(ms)
magface-r18	2.89	0.58
magface-r50	3.25	1.36
magface-r100	3.34	2.37
arcface-r18	2.90	0.64
mag-cosface-r50	6.56	1.34

Pytorch=1.10.2-cuda_11.5 TensorRT=8.2.1 Hardware=rtx2080ti

Logs

TODO list:

Convert Onnx -> TensorRT engine

Build dockerfile

cd tensorrt-triton-magface 
docker build -t huytn/tensorrt-20.12-py3:v1 .

Run docker container

example save weight path: ./tensorrt-triton-magface/weights/magface_iresnet100_MS1MV2_dp.pth

docker run -it --gpus all --name tensorrt_8_magface_convert -v $(pwd):/convert/ -w /convert/ huytn/tensorrt-20.12-py3:v1 bash
chmod +x ./convert.sh
./convert.sh 0 ./weights iresnet100 magface_iresnet100_MS1MV2_dp

Speed test

python3 speed_test.py --torch_path ./weights/magface_iresnet100_MS1MV2_dp.pth --trt_path ./weights/magface_iresnet100_MS1MV2_dp.pth

Triton server

Check if Server running correctly:

$ curl -v localhost:8330/v2/health/ready
...
< HTTP/1.1 200 OK
< Content-Length: 0
< Content-Type: text/plain

Run docker image of Triton server

docker run --gpus "device=2" --rm -p8330:8000 -p8331:8001 -p8332:8002 -v $(pwd)/model_repository:/models nvcr.io/nvidia/tritonserver:21.12-py3 tritonserver --model-repository=/models --strict-model-config false --log-verbose 1

...
+------------+---------+--------+
| Model      | Version | Status |
+------------+---------+--------+
|magface_onnx| 1       | READY  |
|magface_trt | 1       | READY  |
+------------+---------+--------+
I0714 00:37:55.265177 1 grpc_server.cc:4062] Started GRPCInferenceService at 0.0.0.0:8001
I0714 00:37:55.269588 1 http_server.cc:2887] Started HTTPService at 0.0.0.0:8000
I0714 00:37:55.312507 1 http_server.cc:2906] Started Metrics Service at 0.0.0.0:8002

Python client

python3 client.py dummy --model magface_trt --width 112 --height 112

Benchmark

Benchmark with Triton Client SDK docker container

docker run -it --ipc=host --net=host nvcr.io/nvidia/tritonserver:21.12-py3-sdk /bin/bash\
cd install/bin
perf_analyzer -m magface_trt --percentile=95 --concurrency-range 1:4 -u localhost:8330 --shape input:1,3,112,112 --measurement-interval 10000
...
Inferences/Second vs. Client p95 Batch Latency
Concurrency: 1, throughput: 560.4 infer/sec, latency 2095 usec
Concurrency: 2, throughput: 1242.8 infer/sec, latency 2007 usec
Concurrency: 3, throughput: 1093.2 infer/sec, latency 2619 usec
Concurrency: 4, throughput: 913.8 infer/sec, latency 3766 usec

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
__pycache__		__pycache__
dataloader		dataloader
eval		eval
inference		inference
model_repository		model_repository
models		models
raw		raw
run		run
utils		utils
weights		weights
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
client.py		client.py
convert.sh		convert.sh
export_onnx.py		export_onnx.py
export_trt.py		export_trt.py
onnx_opencv_infer.py		onnx_opencv_infer.py
requirements.txt		requirements.txt
speed_test.py		speed_test.py

License

tonhathuy/tensorrt-triton-magface

Folders and files

Latest commit

History

Repository files navigation

tensorrt-triton-magface

Speed test result

Logs

Convert Onnx -> TensorRT engine

Build dockerfile

Run docker container

Speed test

Triton server

Benchmark

About

Topics

Resources

License

Stars

Watchers

Forks

Languages