Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DPR request times out when using gunicorn preload option #2409

Closed
aruroyc opened this issue Apr 12, 2022 · 6 comments
Closed

DPR request times out when using gunicorn preload option #2409

aruroyc opened this issue Apr 12, 2022 · 6 comments
Assignees

Comments

@aruroyc
Copy link

aruroyc commented Apr 12, 2022

Question
When using gunicorn to start the app server with a --preload flag, the request for passage retrieval for a query times out after reaching the embed_queries() function line 214 in haystack/nodes/retriever/dense.py
Same code works perfectly when not using the preload flag.

Additional context
Startup command: gunicorn --preload -c gunicorn.conf.py main:app -k uvicorn.workers.UvicornWorker
Retriever used: DensePassageRetriever
haystack version: 1.0.0

@aruroyc aruroyc changed the title DPR request times out when using gunicorn. DPR request times out when using gunicorn preload option Apr 13, 2022
@brandenchan
Copy link
Contributor

Hi @aruroyc, I had a quick look into this but nothing jumped out at me yet. I'm tagging @ZanSara here who I think will be able to help you out!

@ZanSara
Copy link
Contributor

ZanSara commented Apr 14, 2022

Hey @aruroyc, I'm not too familiar with the --preload flag either, but reading the documentation I think I have a clue of what's going on.

--preload "load application code before the worker processes are forked."(https://docs.gunicorn.org/en/stable/settings.html#preload-app). The fact is that our REST API for Haystack 1.0.0 used to do a lot of operations at import time: namely, they were loading all the nodes and setting up the pipeline.

I have the impression that:

  • You're using a dense retriever, and as such modified the REST API to compute the embeddings.
  • The code calculating the embeddings is executed at import time

Is any of this true? If you modified the REST API, could you share your changes?

A possible solution would be to upgrade Haystack and the REST API to 1.3.0. Plenty of improvements there, including a refactoring of the REST API that now do not load the pipeline at import time anymore. However, from 1.0.0 you might have to migrate a few things: proceed with care and skim through the changelog before jumping on the next version. As an alternative, I believe you have to find a way to defer the creation of embeddings at any point after import time: this is likely to allow the workers to boot and fork without meeting with any timeout.

I hope any of this helps! In case this didn't sort out things for you, please be a bit more specific about your setup and I'll see if I can help you further.

@brandenchan brandenchan self-assigned this Apr 14, 2022
@aruroyc
Copy link
Author

aruroyc commented Apr 14, 2022

@ZanSara @brandenchan Thanks for looking into this quickly!

@ZanSara I created a FastAPI REST endpoint that calculates the query embeddings at runtime, not import time. It then uses a ElasticSearchDocumentStore to query and retrieve the top_n similar passages. for this I am using:

search_pipeline.run(query['question'], params=params)
Where search_pipeline is a DocumentSearchPipeline global object (with a reference to the DPR retriever) created at import time.

While looking up possible solutions over the internet, I saw that this problem could in fact related to how pytorch does intra_op_parallelism. Following a couple of discussion threads I finally landed here:
pytorch/pytorch#49555

Following solutions, the below approach of setting the no of threads for intra_op_parallelism to 1 using torch.set_num_threads(1) works, but I am unsure if this could results in performance issues when the Rest API is called in parallel at high concurrency.
https://stackoverflow.com/questions/59144482/running-pytorch-multiprocessing-in-a-docker-container-with-gunicorn-worker-manag

Might the issue have something to do with how the DPR model is run in pytorch and whether it uses multiprocessing?
Please note that I have not supplied num_workers>1 in any function call/config etc.

I can draw up a prototype code on github to reproduce this, in case this does not help.

@ZanSara
Copy link
Contributor

ZanSara commented Apr 19, 2022

Thanks for the detailed explanation! I see you have a very different setup than I imagined.

I don't know how much I can help you here: it seems like this is an issue with Pytorch. What I can tell you is that Haystack's DPRetriever implementation is not complex and does not use multiprocessing: we generally defer to pytorch all the optimization and parallelization work. However, if you eventually figure out that this is a bug caused by DPR, by all means let us know so that we can fix it. In that case a small reproducible example would be a fantastic help.

Good luck for your bug hunt!

@ZanSara
Copy link
Contributor

ZanSara commented May 10, 2022

Hey @aruroyc, do you still need help with this issue? Otherwise we should close it 🙂

@aruroyc
Copy link
Author

aruroyc commented May 10, 2022

Closing this as the issue appears to be within the pytorch module and how it handles intra_op_parallelism.

@aruroyc aruroyc closed this as completed May 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants