Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not process server: Error collecting pg_stat_statements: pq: canceling statement due to statement timeout #164

Open
caleb15 opened this issue Mar 20, 2021 · 11 comments

Comments

@caleb15
Copy link
Contributor

caleb15 commented Mar 20, 2021

In our docker container I am seeing frequent errors like below:

2021-03-19T23:22:10.974893936Z E [default] Could not process server: Error collecting pg_stat_statements: pq: canceling statement due to statement timeout
2021-03-19T23:32:07.913737284Z E [default] Could not process server: Error collecting pg_stat_statements: pq: canceling statement due to statement timeout
2021-03-19T23:42:10.221978219Z E [default] Could not process server: Error collecting pg_stat_statements: pq: canceling statement due to statement timeout
2021-03-19T23:52:07.792295692Z E [default] Could not process server: Error collecting pg_stat_statements: pq: canceling statement due to statement timeout
2021-03-20T00:02:11.995701411Z E [default] Could not process server: Error collecting pg_stat_statements: pq: canceling statement due to statement timeout
2021-03-20T00:12:07.517917903Z E [default] Could not process server: Error collecting pg_stat_statements: pq: canceling statement due to statement timeout
2021-03-20T00:22:14.380817556Z E [default] Could not process server: Error collecting pg_stat_statements: pq: canceling statement due to statement timeout
2021-03-20T00:32:09.701303512Z E [default] Could not process server: Error collecting pg_stat_statements: pq: canceling statement due to statement timeout
2021-03-20T00:42:07.446886906Z E [default] Could not process server: Error collecting pg_stat_statements: pq: canceling statement due to statement timeout
2021-03-20T00:52:08.508554841Z E [default] Could not process server: Error collecting pg_stat_statements: pq: canceling statement due to statement timeout
2021-03-20T01:02:08.764939569Z E [default] Could not process server: Error collecting pg_stat_statements: pq: canceling statement due to statement timeout
2021-03-20T01:12:12.094003633Z E [default] Could not process server: Error collecting pg_stat_statements: pq: canceling statement due to statement timeout

The database it is connecting to does not have a statement timeout:

caleb@influxdb0.CLOUD100:~$ sudo docker exec pganalyze-mirror-heavy env
PATH=/go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=secret
DB_URL=postgres://pganalyze:secret@db.mirror0-heavy.cloud100/secret
PGA_API_KEY=secret
AWS_INSTANCE_ID=cloud100-mirror0-heavy
GOLANG_VERSION=1.14.4
GOPATH=/go
HOME_DIR=/home/pganalyze
CODE_DIR=/go/src/github.com/pganalyze/collector
HOME=/root

image
image
image

@caleb15
Copy link
Contributor Author

caleb15 commented Mar 20, 2021

The pganalyze container pointed at the heavy mirror is using a bunch of memory compared to the other instances for some reason:

CONTAINER ID        NAME                     CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
490ccc38d41d        pganalyze-mirror-heavy   0.00%               2.991GiB / 7.698GiB   38.85%              10.2GB / 6.79GB     26.1MB / 1.09GB     10
36389b04b5f7        pganalyze-mirror1        0.00%               1.342GiB / 7.698GiB   17.44%              13.7GB / 3.46GB     167MB / 9.24GB      10
8f4c79d8ea41        pganalyze-mirror         0.00%               1.472GiB / 7.698GiB   19.12%              13.8GB / 3.48GB     228MB / 9.1GB       10
246474c64fb0        pganalyze                0.00%               97.4MiB / 7.698GiB    1.24%               10.2GB / 2.33GB     157MB / 338MB       10

May be unrelated.

@lfittl
Copy link
Member

lfittl commented Mar 20, 2021

@caleb15 Could you try doing a "SELECT pg_stat_statements_reset()" to see if that addresses the timeouts?

(when the Postgres query text file grows very large, retrieving the text sometimes takes too long)

@caleb15
Copy link
Contributor Author

caleb15 commented Mar 20, 2021

Sure, I'll try that out, although I still don't see how it can be timing out with statement timeouts disabled.

Unrelated question: when you recommended that I set Collector: Query Statistics Reset Frequency what value did you recommend again? 24h?

@lfittl
Copy link
Member

lfittl commented Mar 20, 2021

@caleb15 Yeah, 24h is a good place to start - in my experience that resolves the issues 90% of the time.

The reason the statement_timeout setting doesn't take effect is because the collector sets a session-local session timeout:

func SetQueryTextStatementTimeout(connection *sql.DB, logger *util.Logger, server *state.Server) {
queryTextStatementTimeoutMs := server.Grant.Config.Features.StatementTimeoutMsQueryText
if queryTextStatementTimeoutMs == 0 { // Default value
queryTextStatementTimeoutMs = 120000
}
SetStatementTimeout(connection, queryTextStatementTimeoutMs)
return
}
(120 seconds)

@caleb15
Copy link
Contributor Author

caleb15 commented Mar 20, 2021

I enabled it, I'll see if that fixes it. For the Collector: Query Statistics Reset Frequency should I just set that to 24h for the master or do it for the master and the mirrors?

@lfittl
Copy link
Member

lfittl commented Mar 20, 2021

@caleb15 I would set it on all the servers where you are seeing timeouts.

In addition, make sure you have created the helper function on the primary, so its available for both primary and replicas:

CREATE OR REPLACE FUNCTION pganalyze.reset_stat_statements() RETURNS SETOF void AS
$$
  /* pganalyze-collector */ SELECT * FROM public.pg_stat_statements_reset();
$$ LANGUAGE sql VOLATILE SECURITY DEFINER;

(to be created as a user that has superuser / rds_superuser or similar permissions)

@caleb15
Copy link
Contributor Author

caleb15 commented Mar 20, 2021

This appears to have worked, thanks!

@caleb15 caleb15 closed this as completed Mar 20, 2021
@caleb15
Copy link
Contributor Author

caleb15 commented Mar 25, 2021

@lfittl it's happening again with all the mirrors. I already had the reset set to 24h for all mirrors. Should change the reset to every 12h, or maybe increase the timeout? The statement timeout is at 30s and the query text timeout is at 120s.

@lfittl
Copy link
Member

lfittl commented Mar 26, 2021

@lfittl it's happening again with all the mirrors. I already had the reset set to 24h for all mirrors. Should change the reset to every 12h, or maybe increase the timeout? The statement timeout is at 30s and the query text timeout is at 120s.

Yeah, reducing the reset time interval sounds like a good next step. Also its worth double checking that you can run SELECT pganalyze.reset_stat_statements() as the pganalyze user (to make sure there are no permission issues).

@caleb15
Copy link
Contributor Author

caleb15 commented Mar 31, 2021

Reducing the reset time interval didn't work 😢 I confirmed that I can run reset stat statements. Maybe I should try increasing Collector: Query Text Timeout to 300s?

@caleb15 caleb15 reopened this Mar 31, 2021
@lfittl
Copy link
Member

lfittl commented Mar 31, 2021

@caleb15 Lets get together in a call to debug this further - I'll send you an email to follow up in a moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants