PERSEUS: Characterizing Performance and Cost of Multi-Tenant Serving for CNN Models

About

This repo is for the short paper PERSEUS: Characterizing Performance and Cost of Multi-Tenant Serving for CNN Models in proceedings of IC2E 2020.

In this paper, we looked at the problem of efficiency and cost saving deep learning inference in the cloud environment. More concretely, we tackled the problem using multi-tenant model serving -- instead of having GPU servers hosting one model dedicately, we serve multiple models on individual GPU servers, subject to the GPU memory capacity. In doing so, we improved the utilization of hardware resources, especially GPU. To achieve this task, we built a measurement framework PERSEUS to characterize and measure the performence and cost trade-offs doing multi-tenant model serving.

Highlight

We evaluated multi-tenant model serving using PERSEUS on several metrics such as inference throughput, monetary cost, and GPU utilization. We showed that multi-tenant serving can lead to up to 12% cost reduction, while maintaining the SLA requirement of model serving.
We identified several potential improvements from the deep learning framework's perspective, to provide better support for serving models, especially on CPUs.

Fig 1. Throughput comparison measured of dedicated serving vs. multi-tenant serving.

Fig 2. Monetary saving with multi-tenant serving.

How to use the code

Please see the instructions in the individual modules in code folder.

Citation

If you would like to cite the paper, please cite it as:

@article{lemay2019perseus,
    title={Perseus: Characterizing Performance and Cost of Multi-Tenant Serving for CNN Models},
    author={Matthew LeMay and Shijian Li and Tian Guo},
    year={2019},
    eprint={1912.02322},
    archivePrefix={arXiv},
    primaryClass={cs.DC}
}

Acknowledgement

We would like to thank National Science Foundation grants #1755659 and #1815619, and Google Cloud Platform Research credits.

Contact

More project information can be found in our lab's project site.

Mattew LeMay mlemay@wpi.edu
Shijian Li sli8@wpi.edu
Tian Guo tian@wpi.edu

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
code		code
data		data
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

data

data

LICENSE

LICENSE

README.md

README.md

Repository files navigation

PERSEUS: Characterizing Performance and Cost of Multi-Tenant Serving for CNN Models

About

Highlight

How to use the code

Citation

Acknowledgement

Contact

About

Releases

Packages

Contributors 3

Languages

License

cake-lab/perseus

Folders and files

Latest commit

History

Repository files navigation

PERSEUS: Characterizing Performance and Cost of Multi-Tenant Serving for CNN Models

About

Highlight

How to use the code

Citation

Acknowledgement

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Languages