-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recovery failure with barman due to max_connections #2478
Comments
@phisco @gabriele-wolfox @mnencia |
After more investigation I found the culprit: there was a max_connection change from 10 to 40 after the base backup was written, so that change is only present in the WAL. When CNPG initializes the config, it uses pg_controldata to figure the max_connections setting. According to the docs "pg_controldata prints information initialized during initdb", so that's why it returned 10 and then CNPG uses that value to override the setting from the postgresql.parameters stanza. When barman-cloud-restore gets to the WAL entry of increasing max_connections, it pauses with the error message shown above. The fix to this problem is to not overwrite the max_connections setting with the value from pg_controldata. If we used the user-specified max_connections setting, things would work fine as the user can pick a high enough max_connections. My solution for restoring the data was to manually start a postgresql server on a clean machine, do the barman-cloud-restore (had to override restore_command and archive_command in postgresql.auto.conf) and make a pg_dump of the database. Then in CNPG I created a clean db using initdb and pg_restored the data into it using kubectl exec. |
@gsimko can you be more explicit with the steps that brought you to this situation?
|
AFAICT the following steps led to this situation:
The reason for the failure is that the new cluster is initialized with max_connection=10, but when the restoration process gets to the WAL log with max_connection=40 it stops due to not supporting such an increase. Hope that helps! |
But the error message you shared is saying: The second cluster was created with max_connections set to 10 or 40? |
On the second cluster I set it at 40. The reason why it uses 10 - and that's the actual bug - is because CNPG internally overrides the user setting by reading the max_connection setting from pg_controldata, which returns the value at the time when the table was created. |
Maybe I got it, it’s set to 10 from the backup, then at some point replaying WALs, it’s set to 40, but we still try to force it back to 10 because pg_controldata says so. I’ll try to reproduce it, thanks! (Which is exactly what you said above, but now I got it too 😂) |
I can reproduce it, the problem is during the recovery, we are enforce to use the max_connections from backup.
then when we do the full recovery, we are using the max_connection=100 in backup A start the server in standby mode and recovery wals, when reach to the wal which increase the max_connection, the postgres in the recovery job will pause and recovery job will hangs. |
) Ensure the PostgreSQL replication parameters are set to the higher value between the ones specified in the cluster specification and the ones stored in the backup. This will ensure that the backup will be restored correctly while allowing the users to raise their value to accommodate changes in the configuration that have happened after the backup was taken. Partially closes #2478 #2337 Signed-off-by: Tao Li <tao.li@enterprisedb.com> Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com> Co-authored-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Co-authored-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com>
) Ensure the PostgreSQL replication parameters are set to the higher value between the ones specified in the cluster specification and the ones stored in the backup. This will ensure that the backup will be restored correctly while allowing the users to raise their value to accommodate changes in the configuration that have happened after the backup was taken. Partially closes #2478 #2337 Signed-off-by: Tao Li <tao.li@enterprisedb.com> Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com> Co-authored-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Co-authored-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com> (cherry picked from commit 87f80ce)
) Ensure the PostgreSQL replication parameters are set to the higher value between the ones specified in the cluster specification and the ones stored in the backup. This will ensure that the backup will be restored correctly while allowing the users to raise their value to accommodate changes in the configuration that have happened after the backup was taken. Partially closes #2478 #2337 Signed-off-by: Tao Li <tao.li@enterprisedb.com> Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com> Co-authored-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Co-authored-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com> (cherry picked from commit 87f80ce)
) Ensure the PostgreSQL replication parameters are set to the higher value between the ones specified in the cluster specification and the ones stored in the backup. This will ensure that the backup will be restored correctly while allowing the users to raise their value to accommodate changes in the configuration that have happened after the backup was taken. Partially closes #2478 #2337 Signed-off-by: Tao Li <tao.li@enterprisedb.com> Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com> Co-authored-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Co-authored-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com> (cherry picked from commit 87f80ce)
…oudnative-pg#4564) Ensure the PostgreSQL replication parameters are set to the higher value between the ones specified in the cluster specification and the ones stored in the backup. This will ensure that the backup will be restored correctly while allowing the users to raise their value to accommodate changes in the configuration that have happened after the backup was taken. Partially closes cloudnative-pg#2478 cloudnative-pg#2337 Signed-off-by: Tao Li <tao.li@enterprisedb.com> Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com> Co-authored-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Co-authored-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com> Signed-off-by: Douglass Kirkley <dkirkley@eitccorp.com>
CNPG version: 1.20.1
I'm trying to recover from a backup that was produced with barmanObjectStore but recovering the primary instance fails.
In postgresql.parameters.max_connections I use a value of 40, and because of that the following error is produced:
"message":"hot standby is not possible because of insufficient parameter settings","detail":"max_connections = 10 is a lower setting than on the primary server, where its value was 40."
Meaning that recovery from a backup must use at least as large max_connections as what was used by the backup.
Tracking down the code I figured that the max_connections=10 setting comes from pg_controldata(src), which strictly overwrites the user setting.
I guess the user should be able to override max_connections when running a recovery to match that was backed up?
I've checked and /var/lib/postgresql/data/pgdata/custom.conf indeed shows max_connections=10.
What's confusing though is that running pg_controldata displays "max_connections setting: 40" so I'm confused where that 10 comes from.
The logs show this: {..., "msg":"enforcing parameters found in pg_controldata","parameters":{"max_connections":"10","max_locks_per_transaction":"64","max_prepared_transactions":"0","max_wal_senders":"10","max_worker_processes":"32"}}
The text was updated successfully, but these errors were encountered: