Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRITICAL (node "foo" (ID: 2) is not attached to expected upstream node "bar" (ID: 1) repmgr-16 #838

Open
dirtyworks opened this issue Jan 18, 2024 · 0 comments

Comments

@dirtyworks
Copy link

dirtyworks commented Jan 18, 2024

Attempting to add an old node back into the cluster fails despite no obvious errors during join process. Both nodes are idle sandbox machines. Plenty of CPU and memory, doing nothing.
This is on RHEL8, selinux enabled, and postgresql.org repos, Postgresql 16, repmgr-16. Installed with yum. selinux enabled is mandatory.
Steps in order executed.

  1. rm -rf /var/lib/pgsql/16/data/*
  2. rm -rf /var/lib/pgsql/16/wal/*
    • /usr/pgsql-16/bin/repmgr -f /etc/repmgr/16/repmgr.conf standby clone --upstream-conninfo 'host=bar port=15432 dbname=repmgr user=superduper passfile=/var/lib/pgsql/.pgpass sslmode=prefer sslcert=/etc/ssl/certs/host.crt sslkey=/var/lib/pgsql/key.pem sslrootcert=/etc/ssl/certs/ca-bundle.crt' -d 'bar port=15432 dbname=repmgr user=superduper passfile=/var/lib/pgsql/.pgpass sslmode=prefer sslcert=/etc/ssl/certs/host.crt sslkey=/var/lib/pgsql/key.pem sslrootcert=/etc/ssl/certs/ca-bundle.crt' -v -L DEBUG
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: /usr/pgsql-16/bin/pg_ctl start -D /var/lib/pgsql/16/data
DEBUG: get_node_record():
  SELECT n.node_id, n.type, n.upstream_node_id, n.node_name,  n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached   FROM repmgr.nodes n  WHERE n.node_id = 2
DEBUG: get_node_record(): no record found for node 2
HINT: after starting the server, you need to register this standby with "repmgr standby register"

I am logging all queries on bar and an insert is not being called on repmgr.nodes during cloning.
I assume the last get_node_record() is actually successful because "standby register" hasn't been run.
But, maybe some part of the cloning process has failed because foo isn't registered in the repmgr.nodes table? If that's the case, then there's no indication the cloning process has failed.

  1. /usr/pgsql-16/bin/repmgr -f /etc/repmgr/16/repmgr.conf node service --action start
  2. /usr/pgsql-16/bin/repmgr -f /etc/repmgr/16/repmgr.conf standby register -d 'host=bar port=15432 dbname=repmgr user=superduper passfile=/var/lib/pgsql/.pgpass sslmode=prefer sslcert=/etc/ssl/certs/host.crt sslkey=/var/lib/pgsql/key.pem sslrootcert=/etc/ssl/certs/ca-bundle.crt' -v -L DEBUG --upstream-node-id=1
    ERROR: this node does not appear to be attached to upstream node "bar" (ID: 1)
    I can force the command and it registers, but it's not connected.
    Somewhere on the internet, someone had "standby follow" work. It didn't. It exited successful as well.
  3. /usr/pgsql-16/bin/repmgr -f /etc/repmgr/16/repmgr.conf standby follow -d 'host=bar port=15432 dbname=repmgr user=superduper passfile=/var/lib/pgsql/.pgpass sslmode=prefer sslcert=/etc/ssl/certs/host.crt sslkey=/var/lib/pgsql/key.pem sslrootcert=/etc/ssl/certs/ca-bundle.crt' -v -L DEBUG --upstream-node-id=1
WARNING: node "foo" not found in "pg_stat_replication"
DEBUG: sleeping 30 of max 30 seconds waiting for standby to attach to primary
NOTICE: STANDBY FOLLOW successful
  1. /usr/pgsql-16/bin/repmgr -f /etc/repmgr/16/repmgr.conf node check
    Upstream connection: CRITICAL (node "foo" (ID: 2) is not attached to expected upstream node "bar" (ID: 1))

No amount of restarting repmgr, postgresql on either node changes the outcome. Deleting and re-cloning the primary doesn't change the results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant