New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for waiting clients count #913
base: master
Are you sure you want to change the base?
Add support for waiting clients count #913
Conversation
bf14f6a
to
8b20b7f
Compare
6c640f7
to
75f6e77
Compare
IMO this counter doesn't help. If I obtain 1,000 wait_count that waits for 80ms and 50,000 wait_count that waits for 1ms, it seems the later indicates a bad scenario but indeed the former is worse. This counter without the time information is useless. That's why there is only wait_time available. Let's not add another counter just because other metrics already provide a counter. |
Well, feel free to close then but I don't really understand the argument to be honest.
But we do have the time, so what's the problem?
What's the problem with giving people information about what's going on in their system? Just document what it does and let people decide what (if anything) they want to do with said information. Not giving people the information because one's afraid that they might misinterpret it seems like a strange concept to me. Regardless, this is just my 2cents, if you feel like this adds no value to the project then feel free to close. |
if (wait_count > 0) | ||
avg->wait_time = (cur->wait_time - old->wait_time) / wait_count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this counter doesn't help. If I obtain 1,000 wait_count that waits for 80ms and 50,000 wait_count that waits for 1ms, it seems the later indicates a bad scenario but indeed the former is worse. This counter without the time information is useless. That's why there is only wait_time available. Let's not add another counter just because other metrics already provide a counter.
But we do have the time, so what's the problem?
I think adding the wait_count
isn't a huge problem, and to me the metric seems useful. But what I think could be considered questionable is that this changed line changes the meaning of the avg_wait_time
column significantly. The docs state that the following is its meaning:
The docs currently state that avg_wait_time
means this:
avg_wait_time
Average time spent by clients waiting for a server that were assigned a backend connection within the current stats_period, in microseconds (averaged per second within that period).
Honestly I think, while that seems like quite a useful metric, but I think its name is really confusing (and I'm not alone: #316). I think a better name would be per_second_wait_time. And I also think the new calculation for avg_wait_time makes much more sense given it's in line with averages calculations for the others. However, breaking backwards compatibility by changing its current meaning is not an easy thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I spoke to @jcoleman at PGConf NYC and he was also annoyed by the problem that avg_wait_time
is a really strange metric, and not very useful when debugging performance issues (because you only know how much clients are waiting in total but not on average). So he had made a similar patch a while back: #727.
I prefer #727 over this PR, because it averages the wait time across all backend assignments, not just the assignments where a wait took place. Including the 0 second wait times in the average makes sure we don't over inflate the avg_wait_time when many clients are not waiting at all.
Hello 馃憢馃徑
In this PR, I'm trying to add support to a new logging variable that would track how many clients have had to wait in order to connect to the server.
I feel like this makes sense when taking into account that there already exist
{query,xact}_{count,time}
/wait_time
variables.I tried looking into how these were calculated and follow the same approach for this one, but I'm of course open to feedback.