Gunicorn preload flag not working with PyTorch library #49555

hsilveiro · 2020-12-17T18:50:55Z

Hello,
We have been developing a FastAPI application where we use some external libraries to perform some NLP tasks, such as tokenization. On top of this, we are launching the service with Gunicorn so that we can parallelize the requests.
However, we are having difficulties using Stanza with Gunicorn’s preload flag active.
It is a requirement to use this flag because since Stanza models can be large, we want the models to be loaded only once, in the Gunicorn master process. This way, Gunicorn workers can access the models that were previously loaded in the master process.

The difficulties that we are facing resumes on the fact that Gunicorn workers hang when trying to make an inference over a given model (that was loaded initially by the master process).

We’ve done some research and debugging but we weren’t able to find a solution. However, we noticed that the worker hangs when the code reaches the prediction step on PyTorch.
Although we are talking about Stanza, this problem also occurred with Sentence Transformers library. And both of them are using the PyTorch library.

Following, I’ll present more details:

Environment:

FastApi version: 0.54.2
Gunicorn version: 20.0.4
Uvicorn version: 0.12.3
Python version: 3.7
Stanza version: 1.1.1
OS: macOS Catalina 10.15.6

Steps executed:

Gunicorn command:

gunicorn --workers 1 --worker-class uvicorn.workers.UvicornWorker --max-requests=0 --max-requests-jitter=0 --timeout=120 --keep-alive=2 \
     --log-level=info --access-logfile - --preload -b 0.0.0.0:8010 my_app:app

The code that will run before launching the workers

def initialize_application() -> None:
     ...
     model = stanza.Pipeline(
                lang=cls._TOKENIZER_MODEL_LANGUAGES[language],
                package=cls._MODEL_TYPE[language],
                processors=cls._TOKENIZER_MODEL,
                tokenize_no_ssplit=True,
            )

This way, the model can be loaded only once, in the master process.
Once the required workers are launched, they should have access to the previous model, without having to load it by themselves (saving computational resources).

The problem happens when we receive a request that will make use of the model that was initially loaded. The worker that will be responsible for handling the request, won’t be able to use the model for inference. As so, the worker will be hanged until the timeout occurs.

After analyzing the code and debugging it, we reached the following step until the code stopped working:

Our code has a call to the process() method, class Pipeline, on the core.py file of Stanza.
That line calls the specific process() method, in this case from the tokenize_process.py, class TokenizeProcessor
Which calls the PyTorch code, output_predictions() method, from the utils.py
After some steps, it reaches the model.py file still in PyTorch, class Tokenizer(nn.Module), forward(self, x, feats) method, in the following line: nontok = F.logsigmoid(-tok0). It seems that this line is calling some C++ code where we didn’t investigate any further.

Of course, if we remove the --preload flag, everything will run smoothly. Removing it is something that we want to avoid because of the added computational resources that will be necessary (the models will be duplicated in every worker).

We looked through several other issues that could be related to this one, such as:
benoitc/gunicorn#2157
tiangolo/fastapi#2425
tiangolo/fastapi#596
benoitc/gunicorn#2124
and others...

After trying multiple solutions, it wasn’t possible to solve the issue. Do you have any suggestions to handle this? Or other tests that I can perform to give you more information?

Thanks in advance.

cc @zhangguanheng66

The text was updated successfully, but these errors were encountered:

mrshenli · 2020-12-18T02:15:45Z

Hey @hsilveiro, have you asked this question in stanza's repo? Do you know if they any distributed training feature from PyTorch (e.g., DistributedDataParallel or all_reduce) or does each worker only do local training?

cc @cpuhrsch @zhangguanheng66 are you familiar with stanza?

hsilveiro · 2020-12-24T14:39:32Z

Hey @hsilveiro, have you asked this question in stanza's repo? Do you know if they any distributed training feature from PyTorch (e.g., DistributedDataParallel or all_reduce) or does each worker only do local training?

cc @cpuhrsch @zhangguanheng66 are you familiar with stanza?

Yes, I already asked this question in stanza's repo (and also on gunicorn).

hsilveiro · 2021-01-06T18:50:26Z

Hey again
After several experiments, we tried to make some modifications to the number of Inter-op and Intra-op threads.
First of all, we used the following methods:
get_num_interop_threads() and get_num_threads() to have an idea of how many were being used. It seems that the default is four threads for each one (and this default value seems to conform with the number of existing cores in the used CPU).
After knowing this, we started to set a different number of threads in each one and, whenever we have more than one intra-op thread, the process is hung. So, for our worker to not be hanged, we need to set the intra-op thread to one. Did you were aware of this behavior?
If so, will we have any sort of performance problem by using only one intra-op thread?
More so, if there is a performance decrease, is there any other way to have more intra-op threads?
Thanks

aruroyc · 2022-04-13T07:49:40Z

@hsilveiro did you finally manage to solve this? I have a similar situation on haystack with gunicorn. Wondering if there is a way to prevent model loading for each worker (4!)

ozancaglayan · 2023-01-06T19:19:35Z

This can be related to this issue:

python/cpython#84559

PrathamSoni · 2024-01-12T01:37:25Z

Anyone have a resolution for this?

mrshenli added oncall: transformer/mha Issues related to Transformers and MultiheadAttention triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Dec 18, 2020

This was referenced Dec 18, 2020

Gunicorn preload flag not working with PyTorch library benoitc/gunicorn#2478

Closed

Gunicorn preload flag not working with Stanza stanfordnlp/stanza#570

Closed

aruroyc mentioned this issue Apr 14, 2022

DPR request times out when using gunicorn preload option deepset-ai/haystack#2409

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gunicorn preload flag not working with PyTorch library #49555

Gunicorn preload flag not working with PyTorch library #49555

hsilveiro commented Dec 17, 2020 •

edited by pytorch-probot bot

mrshenli commented Dec 18, 2020

hsilveiro commented Dec 24, 2020

hsilveiro commented Jan 6, 2021

aruroyc commented Apr 13, 2022

ozancaglayan commented Jan 6, 2023

PrathamSoni commented Jan 12, 2024

Gunicorn preload flag not working with PyTorch library #49555

Gunicorn preload flag not working with PyTorch library #49555

Comments

hsilveiro commented Dec 17, 2020 • edited by pytorch-probot bot

mrshenli commented Dec 18, 2020

hsilveiro commented Dec 24, 2020

hsilveiro commented Jan 6, 2021

aruroyc commented Apr 13, 2022

ozancaglayan commented Jan 6, 2023

PrathamSoni commented Jan 12, 2024

hsilveiro commented Dec 17, 2020 •

edited by pytorch-probot bot