Skip to content

Gunicorn vs. Circus for managing process and socket #2404

Answered by parano
withsmilo asked this question in Q&A
Discussion options

You must be logged in to vote

Exactly as @bojiang and @timliubentoml answered - besides we want to provide proper async support, the main reason is that Gunicorn, as well as most tools in the Python web development stack are designed for running multiple homogeneous processes, where all processes are running identical web serving code, and it will just fork the same process to multiple workers for vertical scaling.

However this is not great for ML model serving workloads: A resource intense model may limit how many copies can fit in one machine, models will also be idle when other pre-processing, post-processing code is running, which leads to low resource utilization. In order to address this problem in BentoML 1.0, …

Replies: 4 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by withsmilo
Comment options

withsmilo
Apr 9, 2022
Maintainer Author

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants