Replies: 1 comment
-
I assume the reason it surged is that these connections get released while in an RO transaction after a particular error case, but I haven’t been able to correlate these to anything because I consistently see a variety of different connection and timeout job failures in the morgue. I’d love to hear if there’s a good way to hook into the connection pooler and log some status info about a connection as it’s released |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I’ve been seeing this issue for a few months now but it really blew up today, I assume in some way related to the outages affecting lots of sites.
When I review the Sidekiq morgue I’ll see a clump of jobs that failed with the following:
This starts out sprinkled among other failures, but left alone for several days, all of the pooled connections will accrue an open RO txn and the site will be brought to a stand-still.
If I go into pghero and kill all connections the problem usually resolves but today it’s immediately locking up again with a listful of these errors in the morgue
Beta Was this translation helpful? Give feedback.
All reactions