Skip to content

Pub/Sub: publish message hangs waiting for previous publish to timeout #8036

@asnr

Description

@asnr

Environment details

OS type and version: 4.9.125-linuxkit GNU/Linux
Python version and virtual environment information: Python 2.7.16
google-cloud-pubsub package version: 0.41.0

Steps to reproduce

  1. Publish a (first) message to PubSub that fails
  2. Timeout on the future object returned by the call to publish(), before the grpc publish call in the batch thread returns
  3. Publish a (second) message to the same topic. This call hangs until the previous call to grpc publish in the batch thread returns. As the default timeout is currently 10 minutes (!) this can take take 10 minutes to return.

Hanging for 10 minutes is surprising behaviour for an asynchronous API.

Code example

See this gist for code and instructions on how to reproduce this issue.

Wot I think is going on here

This is wild conjecture that I have no supporting evidence for.

That being said, I think the issue starts when the batch thread gets stuck in this call to grpc publish. At this point it is holding onto the lock _state_lock and will continue to hold on to it for 10 minutes until it the call to grpc publish times out.

When the client application calls publish() in the main thread for the second time, it will try to acquire the same lock _state_lock. As this lock is already being held by the batch thread, the main thread hangs and doesn't return from the call to publish().

Metadata

Metadata

Assignees

Labels

api: pubsubIssues related to the Pub/Sub API.triaged for GAtype: questionRequest for information or clarification. Not an issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions