Add support for waiting clients count #913

franciscoreinolds · 2023-08-05T17:12:00Z

Hello 👋🏽

In this PR, I'm trying to add support to a new logging variable that would track how many clients have had to wait in order to connect to the server.

I feel like this makes sense when taking into account that there already exist {query,xact}_{count,time} / wait_time variables.

I tried looking into how these were calculated and follow the same approach for this one, but I'm of course open to feedback.

franciscoreinolds · 2023-08-06T07:42:38Z

Hiya @JelteF

I managed to fix most checks but there are 2 windows ones failing due to connection_count assertions failing. Since I didn't touch any code regarding that I find it weird, have you seen such behaviour before?

Edit: Apparently #923 fixed this 🎉

eulerto · 2023-08-22T00:47:29Z

IMO this counter doesn't help. If I obtain 1,000 wait_count that waits for 80ms and 50,000 wait_count that waits for 1ms, it seems the later indicates a bad scenario but indeed the former is worse. This counter without the time information is useless. That's why there is only wait_time available. Let's not add another counter just because other metrics already provide a counter.

franciscoreinolds · 2023-08-23T19:24:39Z

Well, feel free to close then but I don't really understand the argument to be honest.

If I obtain 1,000 wait_count that waits for 80ms and 50,000 wait_count that waits for 1ms, it seems the later indicates a bad scenario but indeed the former is worse. This counter without the time information is useless.

But we do have the time, so what's the problem?

That's why there is only wait_time available.

What's the problem with giving people information about what's going on in their system? Just document what it does and let people decide what (if anything) they want to do with said information. Not giving people the information because one's afraid that they might misinterpret it seems like a strange concept to me.

Regardless, this is just my 2cents, if you feel like this adds no value to the project then feel free to close.

JelteF · 2023-09-04T08:40:07Z

src/stats.c

+	if (wait_count > 0)
+		avg->wait_time = (cur->wait_time - old->wait_time) / wait_count;


IMO this counter doesn't help. If I obtain 1,000 wait_count that waits for 80ms and 50,000 wait_count that waits for 1ms, it seems the later indicates a bad scenario but indeed the former is worse. This counter without the time information is useless. That's why there is only wait_time available. Let's not add another counter just because other metrics already provide a counter.

But we do have the time, so what's the problem?

I think adding the wait_count isn't a huge problem, and to me the metric seems useful. But what I think could be considered questionable is that this changed line changes the meaning of the avg_wait_time column significantly. The docs state that the following is its meaning:

The docs currently state that avg_wait_time means this:

avg_wait_time
Average time spent by clients waiting for a server that were assigned a backend connection within the current stats_period, in microseconds (averaged per second within that period).

Honestly I think, while that seems like quite a useful metric, but I think its name is really confusing (and I'm not alone: #316). I think a better name would be per_second_wait_time. And I also think the new calculation for avg_wait_time makes much more sense given it's in line with averages calculations for the others. However, breaking backwards compatibility by changing its current meaning is not an easy thing.

Okay, I spoke to @jcoleman at PGConf NYC and he was also annoyed by the problem that avg_wait_time is a really strange metric, and not very useful when debugging performance issues (because you only know how much clients are waiting in total but not on average). So he had made a similar patch a while back: #727.

I prefer #727 over this PR, because it averages the wait time across all backend assignments, not just the assignments where a wait took place. Including the 0 second wait times in the average makes sure we don't over inflate the avg_wait_time when many clients are not waiting at all.

franciscoreinolds force-pushed the fr/add-support-for-waiting-clients-count branch from bf14f6a to 8b20b7f Compare August 5, 2023 18:03

Add support for waiting clients count

75f6e77

franciscoreinolds force-pushed the fr/add-support-for-waiting-clients-count branch from 6c640f7 to 75f6e77 Compare August 6, 2023 07:16

franciscoreinolds marked this pull request as ready for review August 16, 2023 17:11

JelteF reviewed Sep 4, 2023

View reviewed changes

JelteF mentioned this pull request Oct 13, 2023

Average Wait time / Units #316

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for waiting clients count #913

Add support for waiting clients count #913

franciscoreinolds commented Aug 5, 2023 •

edited

franciscoreinolds commented Aug 6, 2023 •

edited

eulerto commented Aug 22, 2023

franciscoreinolds commented Aug 23, 2023

JelteF Sep 4, 2023 •

edited

JelteF Oct 13, 2023

		if (wait_count > 0)
		avg->wait_time = (cur->wait_time - old->wait_time) / wait_count;

Add support for waiting clients count #913

Are you sure you want to change the base?

Add support for waiting clients count #913

Conversation

franciscoreinolds commented Aug 5, 2023 • edited

franciscoreinolds commented Aug 6, 2023 • edited

eulerto commented Aug 22, 2023

franciscoreinolds commented Aug 23, 2023

JelteF Sep 4, 2023 • edited

Choose a reason for hiding this comment

JelteF Oct 13, 2023

Choose a reason for hiding this comment

franciscoreinolds commented Aug 5, 2023 •

edited

franciscoreinolds commented Aug 6, 2023 •

edited

JelteF Sep 4, 2023 •

edited