fix: timeout when restarting PostgreSQL and while lifting fencing #4504

leonardoce · 2024-05-09T14:22:25Z

The instance manager starts PostgreSQL:

when it starts up
when configuration changes are being applied (after stopping it)
when fencing is lifted.

In the second and third examples, the operator is requested by the embedded cluster reconciler loop, and performed without any timeout.

If PostgreSQL won't start up again because of a wrong configuration or missing disk space, the reconciler loop will be stuck waiting for a dead postmaster to be up.

This patch handles this condition by using a combination of the timeout parameters that are already set in the cluster.

Fixes: #4501

github-actions · 2024-05-09T14:22:39Z

❗ By default, the pull request is configured to backport to all release branches.

To stop backporting this pr, remove the label: backport-requested ◀️ or add the label 'do not backport'
To stop backporting this pr to a certain release branch, remove the specific branch label: release-x.y

leonardoce · 2024-05-09T14:27:20Z

Hi! This one seems harmless and needed, but I encourage you to take a deep look.

After much thinking, I still didn't make up my mind about whether it was better to fix it or leave it as it is and employ the energies on refactoring, to make the code flow clearer.

leonardoce · 2024-05-15T06:27:40Z

/test limit=local

github-actions · 2024-05-15T06:27:52Z

@leonardoce, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/9091027252

mnencia · 2024-05-17T17:16:39Z

/test tl=4 l=local

github-actions · 2024-05-17T17:17:21Z

@mnencia, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/9131874590

The instance manager starts PostgreSQL: 1. when it starts up 2. when configuration changes are being applied (after stopping it) 3. when fencing is lifted. In the second and third example, the operator is requested by the embedded cluster reconciler loop, and performed without any timeout. If PostgreSQL won't start up again because of a wrong configuration or missing disk space, the reconciler loop will be stuck waiting for a dead postmaster to be up. This patch handles this condition by using a combination of the timeout parameters that are already set in the cluster. Fixes: cloudnative-pg#4501 Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enteprisedb.com>

Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com>

mnencia · 2024-05-20T10:24:19Z

/ok-to-merge E2E green. Only expected unrelated failures.

) The instance manager starts PostgreSQL: 1. when it starts up 2. when configuration changes are being applied (after stopping it) 3. when fencing is lifted. In the second and third examples, the operator is requested by the embedded cluster reconciler loop, and performed without any timeout. If PostgreSQL won't start up again because of a wrong configuration or missing disk space, the reconciler loop will be stuck waiting for a dead postmaster to be up. This patch handles this condition by using a combination of the timeout parameters that are already set in the cluster. Fixes: #4501 Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enteprisedb.com> Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com> Co-authored-by: Leonardo Cecchi <leonardo.cecchi@enteprisedb.com> Co-authored-by: Armando Ruocco <armando.ruocco@enterprisedb.com> (cherry picked from commit 09a4d80)

leonardoce requested a review from a team as a code owner May 9, 2024 14:22

github-actions bot added backport-requested ◀️ This pull request should be backported to all supported releases release-1.21 release-1.22 release-1.23 labels May 9, 2024

leonardoce force-pushed the dev/contexts-contexts branch from 658e0ad to 5ac35a9 Compare May 9, 2024 14:23

armru force-pushed the dev/contexts-contexts branch from 5ac35a9 to e7819c4 Compare May 13, 2024 13:38

armru approved these changes May 13, 2024

View reviewed changes

leonardoce force-pushed the dev/contexts-contexts branch from e7819c4 to 9ee4fa5 Compare May 15, 2024 06:27

mnencia force-pushed the dev/contexts-contexts branch from 9ee4fa5 to c89fbe7 Compare May 17, 2024 17:13

Leonardo Cecchi and others added 2 commits May 20, 2024 12:19

chore: review

0ad2f7f

Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com>

mnencia force-pushed the dev/contexts-contexts branch from c89fbe7 to 0ad2f7f Compare May 20, 2024 10:19

cnpg-bot added the ok to merge 👌 This PR can be merged label May 20, 2024

mnencia approved these changes May 20, 2024

View reviewed changes

mnencia merged commit 09a4d80 into cloudnative-pg:main May 20, 2024
27 of 28 checks passed

github-actions bot mentioned this pull request May 20, 2024

Backport failure for pull request 4504 #4608

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: timeout when restarting PostgreSQL and while lifting fencing #4504

fix: timeout when restarting PostgreSQL and while lifting fencing #4504

leonardoce commented May 9, 2024 •

edited by mnencia

github-actions bot commented May 9, 2024

leonardoce commented May 9, 2024

leonardoce commented May 15, 2024

github-actions bot commented May 15, 2024

mnencia commented May 17, 2024

github-actions bot commented May 17, 2024

mnencia commented May 20, 2024

fix: timeout when restarting PostgreSQL and while lifting fencing #4504

fix: timeout when restarting PostgreSQL and while lifting fencing #4504

Conversation

leonardoce commented May 9, 2024 • edited by mnencia

github-actions bot commented May 9, 2024

leonardoce commented May 9, 2024

leonardoce commented May 15, 2024

github-actions bot commented May 15, 2024

mnencia commented May 17, 2024

github-actions bot commented May 17, 2024

mnencia commented May 20, 2024

leonardoce commented May 9, 2024 •

edited by mnencia