-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DPR request times out when using gunicorn preload option #2409
Comments
Hey @aruroyc, I'm not too familiar with the
I have the impression that:
Is any of this true? If you modified the REST API, could you share your changes? A possible solution would be to upgrade Haystack and the REST API to 1.3.0. Plenty of improvements there, including a refactoring of the REST API that now do not load the pipeline at import time anymore. However, from 1.0.0 you might have to migrate a few things: proceed with care and skim through the changelog before jumping on the next version. As an alternative, I believe you have to find a way to defer the creation of embeddings at any point after import time: this is likely to allow the workers to boot and fork without meeting with any timeout. I hope any of this helps! In case this didn't sort out things for you, please be a bit more specific about your setup and I'll see if I can help you further. |
@ZanSara @brandenchan Thanks for looking into this quickly! @ZanSara I created a FastAPI REST endpoint that calculates the query embeddings at runtime, not import time. It then uses a ElasticSearchDocumentStore to query and retrieve the top_n similar passages. for this I am using:
While looking up possible solutions over the internet, I saw that this problem could in fact related to how pytorch does intra_op_parallelism. Following a couple of discussion threads I finally landed here: Following solutions, the below approach of setting the no of threads for intra_op_parallelism to 1 using torch.set_num_threads(1) works, but I am unsure if this could results in performance issues when the Rest API is called in parallel at high concurrency. Might the issue have something to do with how the DPR model is run in pytorch and whether it uses multiprocessing? I can draw up a prototype code on github to reproduce this, in case this does not help. |
Thanks for the detailed explanation! I see you have a very different setup than I imagined. I don't know how much I can help you here: it seems like this is an issue with Pytorch. What I can tell you is that Haystack's DPRetriever implementation is not complex and does not use multiprocessing: we generally defer to pytorch all the optimization and parallelization work. However, if you eventually figure out that this is a bug caused by DPR, by all means let us know so that we can fix it. In that case a small reproducible example would be a fantastic help. Good luck for your bug hunt! |
Hey @aruroyc, do you still need help with this issue? Otherwise we should close it 🙂 |
Closing this as the issue appears to be within the pytorch module and how it handles intra_op_parallelism. |
Question
When using gunicorn to start the app server with a --preload flag, the request for passage retrieval for a query times out after reaching the embed_queries() function line 214 in haystack/nodes/retriever/dense.py
Same code works perfectly when not using the preload flag.
Additional context
Startup command: gunicorn --preload -c gunicorn.conf.py main:app -k uvicorn.workers.UvicornWorker
Retriever used: DensePassageRetriever
haystack version: 1.0.0
The text was updated successfully, but these errors were encountered: