Environment details
OS type and version: 4.9.125-linuxkit GNU/Linux
Python version and virtual environment information: Python 2.7.16
google-cloud-pubsub package version: 0.41.0
Steps to reproduce
- Publish a (first) message to PubSub that fails
- Timeout on the future object returned by the call to
publish(), before the grpc publish call in the batch thread returns
- Publish a (second) message to the same topic. This call hangs until the previous call to grpc publish in the batch thread returns. As the default timeout is currently 10 minutes (!) this can take take 10 minutes to return.
Hanging for 10 minutes is surprising behaviour for an asynchronous API.
Code example
See this gist for code and instructions on how to reproduce this issue.
Wot I think is going on here
This is wild conjecture that I have no supporting evidence for.
That being said, I think the issue starts when the batch thread gets stuck in this call to grpc publish. At this point it is holding onto the lock _state_lock and will continue to hold on to it for 10 minutes until it the call to grpc publish times out.
When the client application calls publish() in the main thread for the second time, it will try to acquire the same lock _state_lock. As this lock is already being held by the batch thread, the main thread hangs and doesn't return from the call to publish().
Environment details
OS type and version: 4.9.125-linuxkit GNU/Linux
Python version and virtual environment information: Python 2.7.16
google-cloud-pubsub package version: 0.41.0
Steps to reproduce
publish(), before the grpc publish call in the batch thread returnsHanging for 10 minutes is surprising behaviour for an asynchronous API.
Code example
See this gist for code and instructions on how to reproduce this issue.
Wot I think is going on here
This is wild conjecture that I have no supporting evidence for.
That being said, I think the issue starts when the batch thread gets stuck in this call to grpc
publish. At this point it is holding onto the lock_state_lockand will continue to hold on to it for 10 minutes until it the call to grpcpublishtimes out.When the client application calls
publish()in the main thread for the second time, it will try to acquire the same lock_state_lock. As this lock is already being held by the batch thread, the main thread hangs and doesn't return from the call topublish().