Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timeout defaults #957

Open
petere opened this issue Oct 4, 2023 · 2 comments
Open

timeout defaults #957

petere opened this issue Oct 4, 2023 · 2 comments

Comments

@petere
Copy link
Member

petere commented Oct 4, 2023

Somewhere recently some of us discussed that maybe some of the default timeout settings should be changed (lowered?). I want to start a discussion on this. Here are the current timeout defaults:

server_lifetime = 3600
server_idle_timeout = 600
server_connect_timeout = 15
server_login_retry = 15
query_timeout = 0
query_wait_timeout = 120
cancel_wait_timeout = 10
client_idle_timeout = 0
client_login_timeout = 60
autodb_idle_timeout = 3600
idle_transaction_timeout = 0
suspend_timeout = 10

(all in seconds)

Thoughts?

@JelteF
Copy link
Member

JelteF commented Oct 5, 2023

To me the main one that is really bad is query_wait_timeout. I don't think anyone ever wants the behaviour that pgbouncer waits 2 minutes to even start running the query. I think a default of somewhere between 5 and 30 seconds seems much more reasonable. I feel like 5 might be a bit too low maybe and 30 is still fairly high, so my vote would be for something like 7-10 seconds. But honestly anything is better than the 2 minutes that it is now. If we choose something lower than 10 seconds, I think cancel_wait_timeout should be set to the same value.

server_login_retry is another that seems very much on the high side. I think retrying every 1 or 2 seems much nicer, to quickly react to e.g. a crashed and restarted postgres.

For similar reasons server_connect_timeout also seems quite high. The worst case amount of roundtrips needed for a connection seem to me to be 4: 3 for TLS, and then 1 for e.g. password auth. Looking at https://wondernetwork.com/pings maximum ping in the world between two cities seems to be 400ms. So 4*.4 = 1.6s. Then there might be some random bad luck, a packet drop (causing a retransmit) or postgres being overloaded and thus responding slow. If we add 3 seconds for that we end up on 4.6s. So if we round that up to 5s I think we have a good default.

server_lifetime and server_idle_timeout seem also a bit on the high side. I wouldn't mind dividing both by two or three, i.e. 30 or 20 minutes and 5 or 3 minutes respectively. But I don't feel strongly about this, I don't think the current values are particularly bad in most cases. Although server_lifetime being an hour can be problematic when partitioning is used, because then per connection relation cache can blow up quite a bit.

The rest of the timeouts seem pretty okay to me.

@petere
Copy link
Member Author

petere commented Oct 18, 2023

To me the main one that is really bad is query_wait_timeout. I don't think anyone ever wants the behaviour that pgbouncer waits 2 minutes to even start running the query. I think a default of somewhere between 5 and 30 seconds seems much more reasonable. I feel like 5 might be a bit too low maybe and 30 is still fairly high, so my vote would be for something like 7-10 seconds. But honestly anything is better than the 2 minutes that it is now. If we choose something lower than 10 seconds, I think cancel_wait_timeout should be set to the same value.

Intuitively, changing this to 10 seems fine to me. But currently, it's listed in etc/pgbouncer.ini as "dangerous", apparently implying that setting it too low is not recommended. What do we think about that?

server_login_retry is another that seems very much on the high side. I think retrying every 1 or 2 seems much nicer, to quickly react to e.g. a crashed and restarted postgres.

This setting is supposed to prevent hammering a misconfigured server. If we set it very low, I think that would essentially disable that mechanism. Then you might as well turn it off.

For similar reasons server_connect_timeout also seems quite high. The worst case amount of roundtrips needed for a connection seem to me to be 4: 3 for TLS, and then 1 for e.g. password auth. Looking at https://wondernetwork.com/pings maximum ping in the world between two cities seems to be 400ms. So 4*.4 = 1.6s. Then there might be some random bad luck, a packet drop (causing a retransmit) or postgres being overloaded and thus responding slow. If we add 3 seconds for that we end up on 4.6s. So if we round that up to 5s I think we have a good default.

server_lifetime and server_idle_timeout seem also a bit on the high side. I wouldn't mind dividing both by two or three, i.e. 30 or 20 minutes and 5 or 3 minutes respectively. But I don't feel strongly about this, I don't think the current values are particularly bad in most cases. Although server_lifetime being an hour can be problematic when partitioning is used, because then per connection relation cache can blow up quite a bit.

I'm not aware of any practical problems with these settings, so I would tend to leave them unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants