Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade flow for images RHEL/CentOS 8+ #562

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

eifrach
Copy link

@eifrach eifrach commented Apr 11, 2024

Creating flow for upgrading postgresSQL

  1. add install_weak_deps=false to installation which reduce the image size
  2. install postgresql-upgrade package for the upgrade

@eifrach
Copy link
Author

eifrach commented Apr 11, 2024

[test]

@@ -297,7 +297,7 @@ function wait_for_postgresql_master() {
run_pgupgrade ()
(
# Remove .pid file if the file persists after ugly shut down
if [ -f "$PGDATA/postmaster.pid" ] && ! pgrep -f "postgres" > /dev/null; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you removing the condition? We do not want to remove .pid file of the running process

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I removed it because I had some issues when I was testing the upgrade. I will look into it deeper.
we should check if the PID is up first

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is from a running container I added a stop before the line
seem that the service is not running, also there isn't any open socket

I'm assuming the it is partially match run-postgresql

bash-4.4$ pgrep -f "postgres" 
1
17

 ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
postgres       1       0  0 12:18 pts/0    00:00:00 /bin/bash /usr/bin/run-postgresql
postgres      17       1  0 12:18 pts/0    00:00:00 /bin/bash /usr/bin/run-postgresql
postgres      18      17  0 12:18 pts/0    00:00:00 /usr/bin/coreutils --coreutils-prog-shebang
postgres      19       0  0 12:19 pts/1    00:00:00 bash
postgres      27      19  0 12:22 pts/1    00:00:00 ps -ef

how about

Suggested change
if [ -f "$PGDATA/postmaster.pid" ] && ! pgrep -f "postgres" > /dev/null; then
if [ -f "$PGDATA/postmaster.pid" ] && ! pg_isready > /dev/null; then

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense +1

@@ -352,7 +352,6 @@ run_pgupgrade ()
info_msg "Starting old postgresql once again for a clean shutdown..."
"${old_pgengine}/pg_ctl" start -w --timeout 86400 -o "-h ''"
info_msg "Waiting for postgresql to be ready for shutdown again..."
"${old_pgengine}/pg_isready"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you removing this safety procedure?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my testing it returned exit code 2 - I did manual checks the socket was up and running but the pg_isready wasn't catching it.
On the above line we use -w which is waiting for the DB to be ready. I thought maybe we don't need to double check it unless there is another reason for it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after some searching, It seems we start the server as local only -- which mean it doesn't create a TCP port instead it creates a Linux socket.

the 'pg_isready' tries the port and fails

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Comment on lines 308 to 323
if test "$old_raw_version" = 92; then
old_collection=postgresql92
else
old_collection=rh-postgresql$old_raw_version
old_collection=postgresql-$old_raw_version
fi
Copy link
Member

@fila43 fila43 Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not correspond with the current state of PostgreSQL rpm names.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@fila43
Copy link
Member

fila43 commented Apr 11, 2024

Overall, it looks good. Thank you for your contribution. We will need to generalize it and implement it in the master branch.
https://github.com/sclorg/postgresql-container/blob/master/src/root/usr/share/container-scripts/postgresql/common.sh

@@ -44,7 +44,8 @@ COPY root/usr/libexec/fix-permissions /usr/libexec/fix-permissions
RUN yum -y module enable postgresql:13 && \
INSTALL_PKGS="rsync tar gettext bind-utils nss_wrapper postgresql-server postgresql-contrib" && \
INSTALL_PKGS="$INSTALL_PKGS pgaudit" && \
yum -y --setopt=tsflags=nodocs install $INSTALL_PKGS && \
INSTALL_PKGS="$INSTALL_PKGS procps-ng util-linux postgresql-upgrade" && \
yum -y --setopt=tsflags=nodocs --setopt=install_weak_deps=false install $INSTALL_PKGS && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What difference in packages is there when using this option? I would rather not introduce it for already built image versions if the change is too disruptive.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it not a must, it removes about 100MB of size - those are packages that are extra doesn't change / affect the core application

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the existing images, I would choose the more defensive approach. I would just add the necessary packages.

@eifrach
Copy link
Author

eifrach commented Apr 14, 2024

Overall, it looks good. Thank you for your contribution. We will need to generalize it and implement it in the master branch. https://github.com/sclorg/postgresql-container/blob/master/src/root/usr/share/container-scripts/postgresql/common.sh

I've issue fixes for all the remarks
once they approved I'll create a PR with changes to all common.sh files

@eifrach eifrach requested review from fila43 and pkubatrh April 15, 2024 12:00
@@ -311,8 +311,8 @@ run_pgupgrade ()
old_collection=rh-postgresql$old_raw_version
fi

old_pgengine=/opt/rh/$old_collection/root/usr/bin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be changed only for RHEL8+ images. The general change would cause problems in rhel7/centos7 containers. Condition needed in distgen source files.

@@ -312,7 +312,7 @@ run_pgupgrade ()
old_collection=rh-postgresql$old_raw_version
fi

old_pgengine=/opt/rh/$old_collection/root/usr/bin
old_pgengine=/usr/lib64/pgsql/postgresql-$old_raw_version/bin
new_pgengine=/opt/rh/rh-postgresql${new_raw_version}/root/usr/bin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/usr/bin ?

@fila43
Copy link
Member

fila43 commented Apr 16, 2024

It makes a good impression on me. Please implement the changes into /src directory. Once it is there, we will check and adjust the compatibility for rhel7/centos7 vs rhel8+ versions, or we can wait for the EOL of centos7/rhel7 images. Also, there should be the prepared upgrade test that needs to be enabled for rhel8+ containers.

@eifrach eifrach changed the title [WIP] adding upgrade flow for postgress 13 Upgrade flow for images RHEL/CentOS 8+ Apr 18, 2024
@eifrach eifrach marked this pull request as ready for review April 18, 2024 12:44
@eifrach
Copy link
Author

eifrach commented Apr 18, 2024

[test]

@eifrach
Copy link
Author

eifrach commented Apr 18, 2024

cc @danmanor

@fila43
Copy link
Member

fila43 commented Apr 24, 2024

LGTM, but there is dist-gen issue

@fila43
Copy link
Member

fila43 commented Apr 24, 2024

@phracek would it be possible to enable this test also for RHEL8+ ?

run_upgrade_test ()

@fila43
Copy link
Member

fila43 commented Apr 24, 2024

[test]

@erthalion
Copy link

Hi,

Thanks for the PR! I've got a couple of general commentaries about PostgreSQL
major version upgrade, and although they may not necessarily be related to the
PR, I reckon it's a good idea to collect them here. As an outside person I
could be missing some details of this project, please let me know if that's the
case.

  • POSTGRESQL_UPGRADE seems to allow only --link and --copy, but since PG
    11 there is also --clone option, which somewhat combines best of both
    worlds, but requires newer Linux kernel versions.

  • The docs mention --link option as an unsafe, but the old cluster should not
    be used only of the new cluster was started. If it wasn't the case, the old
    cluster was unmodified and should be safe to start.

  • It seems that it's possible to provide custom initdb options via
    POSTGRESQL_UPGRADE_INITDB_OPTIONS, but some of them must be the same as the
    old cluster (collate, encoding, data checksums, wal segment size). It sounds
    a bit unreliable to leave it to the container user, could those options be
    enforced?

  • It's worth running pg_upgrade --check to verify compatibility.

  • Maybe a good idea to add -j to upgrade_cmd for additional parallelism.

  • Usually triggering a vacuum after the upgrade to rebuild statistics is a
    good idea as well.

  • More open-ended question, what do you think about those situations where
    some coordination is needed? E.g. when upgrading a replica, most likely rsync
    has to be used in coordination with the primary upgrade. Or upgrading
    extensions, which is not handled via pg_upgrade either.

    It's probably beyond what you want to achieve within a container, but it's
    at least worth mentioning in the documentation, right?

https://www.postgresql.org/docs/current/pgupgrade.html

@fila43
Copy link
Member

fila43 commented Apr 30, 2024

[test]

@fila43
Copy link
Member

fila43 commented Apr 30, 2024

[test]

@fila43
Copy link
Member

fila43 commented May 6, 2024

[test]

1 similar comment
@fila43
Copy link
Member

fila43 commented May 6, 2024

[test]

@fila43
Copy link
Member

fila43 commented May 6, 2024

@phracek @zmiklank any idea about c9s failures? RHEL9 passed but c9s failed

@zmiklank
Copy link
Member

zmiklank commented May 7, 2024

@phracek @zmiklank any idea about c9s failures? RHEL9 passed but c9s failed

This is reproducible also in the master branch - so not seem to be related to this PR. Seems like a segfault (double free) from pg_isready binary.

Aren't there any differences in c9s and rhel9 binaries?

@zmiklank
Copy link
Member

zmiklank commented May 7, 2024

So this may be an postgresql[1] issue which is present only when using newer openssl (3.2.1) that got into c9s, but not into rhel9 yet. postgresql upstream patch seeems to exist already, more info is in the linked mailing thread[1].

[1] https://www.postgresql.org/message-id/CAN55FZ1eDDYsYaL7mv%2BoSLUij2h_u6hvD4Qmv-7PK7jkji0uyQ%40mail.gmail.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants