Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock possible when calling bigquery_storage.BigQueryReadClient() on multiple threads #696

Open
d1manson opened this issue Oct 28, 2023 · 0 comments
Labels
api: bigquerystorage Issues related to the googleapis/python-bigquery-storage API.

Comments

@d1manson
Copy link

d1manson commented Oct 28, 2023

I'm using bigquery_storage.BigQueryReadClient() inside a Prefect task, and if I run five or so instances of that task in parallel it seems to deadlock about 50% of the time.

Wrapping the call in a lock seems to prevent the deadlock:

from prefect import task
from google.cloud import bigquery_storage   # google-cloud-bigquery-storage~=2.16
from threading import Lock

bqrc_lock = Lock()

@task
async def some_task():
   # ... do something unrelated ...
   with bqrc_lock:
      client = bigquery_storage.BigQueryReadClient()
   # ... do something with the client (no further locking required)... and whatever else...

As far as I understand and can tell, Prefect is running each task on a different thread. I suspect the issue is somewhere in the auth logic - I'm running this in GKE with a service account assigned to the node (rather than any explicit creds).

I appreciate that maybe i could create a singleton somehow to avoid instantiating multiple clients in parallel (and thus avoid the issue entirely), but I'm not sure if it would actually be possible to use a singleton safely across threads if I did that. More subjectively, it's nice to keep the client creation located directly at the point in the code that uses it (if there's only one such line and it only gets executed a handful of times during the life of the program).

Thanks

@parthea parthea transferred this issue from googleapis/google-cloud-python Oct 29, 2023
@product-auto-label product-auto-label bot added the api: bigquerystorage Issues related to the googleapis/python-bigquery-storage API. label Oct 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquerystorage Issues related to the googleapis/python-bigquery-storage API.
Projects
None yet
Development

No branches or pull requests

1 participant