New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large number of incoming connections blocks pgBouncer completely #1054
Comments
You are not the first one that complains about it. This message can help users detect that your authentication service cannot keep up with the storm of authentication requests. On the other hand, this message will be printed in every new authentication request until the queue is lesser than PAM_REQUEST_QUEUE_SIZE (20). I'm probably worrying too much because 100 ms (PAM_QUEUE_WAIT_SLEEP_MCS) pauses seem sufficient to alleviate the pressure on the authentication service. What is your authentication service? Are you observing long pauses because of the current behavior? |
Thanks, we’re using LDAP to an AD server. I’m not really sure how to benchmark that part (Pam requests), but you’re probably right that they should be fast. However I’m finding that I can overload it with 40+ new connections, so have increased that queue size locally. |
If I send 100 connections direct to Postgres then they queue and are accepted in turn, and it takes about 9 seconds for the last one to succeed (so perhaps 90ms per connection including the auth round trip). If I send 100 connections to pgBouncer then nothing at all happens for about 12 seconds (and new connections are also blocked for ~12 seconds) and then they all succeed over then next ~3 seconds. |
Sleeping on PgBouncer its main thread should really be a no-go. Afaict a flood of auth requests can this way easily cause queries on already established connections not to go through. The whole locking logic of this piece of code doesn't make much sense to me either. |
When the PAM authentication queue is full, pgBouncer sleeps and stops servicing the event loop:
pgbouncer/src/pam.c
Line 162 in 322acfb
If we can't run the event loop instead, then I think the
slog_debug
above should be anslog_warning
to let the admin know that something is up.The text was updated successfully, but these errors were encountered: