3.2.2 throwing unexpected exception #3066

einhverfr · 2024-05-16T07:07:39Z

What happened?

This often happens when Patroni 3.2.2 starts.

establishing a new patroni heartbeat connection to postgres
2024-05-10 05:27:26,921 INFO: PostgreSQL is still starting up, 87 seconds until timeout
2024-05-10 05:27:32,038 INFO: Lock owner: DRS-INDBPG-01; I am DRS-INDBPG-01
2024-05-10 05:27:32,108 ERROR: Unexpected exception
Traceback (most recent call last):
File "C:\PES\patroni\patroni\ha.py", line 1972, in run_cycle
info = self._run_cycle()
^^^^^^^^^^^^^^^^^
File "C:\PES\patroni\patroni\ha.py", line 1886, in _run_cycle
msg = self.process_healthy_cluster()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\PES\patroni\patroni\ha.py", line 1427, in process_healthy_cluster
return self.enforce_follow_remote_member(msg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\PES\patroni\patroni\ha.py", line 821, in enforce_follow_remote_member
return self.follow(demote_reason, message)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\PES\patroni\patroni\ha.py", line 671, in follow
change_required, restart_required = self.state_handler.config.check_recovery_conf(node_to_follow)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\PES\patroni\patroni\postgresql\config.py", line 797, in check_recovery_conf
primary_conninfo = params['primary_conninfo']
~~~~~~^^^^^^^^^^^^^^^^^^^^
File "C:\PES\patroni\patroni\collections.py", line 150, in getitem
return self._values[key.lower()][1]
~~~~~~~~~~~~^^^^^^^^^^^^^
KeyError: 'primary_conninfo'
2024-05-10 05:27:32,117 INFO: Unexpected exception raised, please report it as a BUG

The same stack trace happens when the connection fails. Environment is on Windows using the Patroni Environment Setup package from Cybertec with their bundled Python libraries

How can we reproduce it (as minimally and precisely as possible)?

My best guess is to install on Windows the PES and Python dependencies as bundled. Then set up Patroni without passwords but require them in the pg_hba.conf. The stack trace will follow the fe_sendauth no password supplied messages.

What did you expect to happen?

I didn't expect this message in the logs.... The message seems harmless. It looks like a problem handling bad connections but the recovery seems fine.

Patroni/PostgreSQL/DCS version

Patroni version: 3.2.2
PostgreSQL version: 15.5
DCS (and its version): etcd (not sure version)

Patroni configuration file

(Ip addresses have been removed and authentication info sanitized)

scope: pgcluster
#namespace: /service/
name: PRD-INDBPG-03
log:
    log_size: 102400
    log_num: 30    

restapi:
  listen: 0.0.0.0:8008
  connect_address: IP28:8008
#  certfile: /etc/ssl/certs/ssl-cert-snakeoil.pem
#  keyfile: /etc/ssl/private/ssl-cert-snakeoil.key
#  authentication:
#    username: username
#    password: password

etcd3:
  hosts: IP26:2379,IP27:2379,IP28:2379

bootstrap:
  # this section will be written into Etcd:/<namespace>/<scope>/config after initializing new cluster
  # and all other cluster members will use it as a `global configuration`
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
#    master_start_timeout: 300
#    synchronous_mode: false
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
#        wal_level: hot_standby
#        hot_standby: "on"
#        wal_keep_segments: 8
#        max_wal_senders: 5
#        max_replication_slots: 5
#        wal_log_hints: "on"
#        archive_mode: "on"
#        archive_timeout: 1800s
#        archive_command: mkdir -p ../wal_archive && test ! -f ../wal_archive/%f && cp %p ../wal_archive/%f
#      recovery_conf:
#        restore_command: cp ../wal_archive/%f %p
# some desired options for 'initdb'
  initdb:  # Note: It needs to be a list (some options need values, others are switches)
  - encoding: UTF8
  - data-checksums
  pg_hba:  # Add following lines to pg_hba.conf after running 'initdb'
  - host replication replicator 0.0.0.0/0 md5
  - host all all 0.0.0.0/0 md5
#  - hostssl all all 0.0.0.0/0 md5
  # Additional script to be launched after initial cluster creation (will be passed the connection URL as parameter)
# post_init: /usr/local/bin/setup_cluster.sh

postgresql:
  listen: 0.0.0.0:5432
  connect_address: IP28:5432
  data_dir: F:/pg_data
  bin_dir: C:/Program Files/PostgreSQL/15/bin
  #pgpass: pgpass0
  authentication:
    replication:
      username: patroni_replicator
      password: somepasswd
    superuser:
      username: patroni
      password: somepasswd


#watchdog:
#  mode: automatic # Allowed values: off, automatic, required
#  device: /dev/watchdog

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false

patronictl show-config

C:\PES\patroni>patronictl.py -c patroni.yaml show-config
loop_wait: 10
maximum_lag_on_failover: 1048576
postgresql:
  parameters:
    max_connections: 1500
    recovery_target_timeline: latest
    restore_command: ssh barman@IP barman get-wal pg %f > %p
  use_pg_rewind: false
  use_slots: true
retry_timeout: 10
slots:
  barman:
    type: physical
  patroni_dr:
    type: physical
ttl: 30
use_pg_rewind: false

Patroni log files

establishing a new patroni heartbeat connection to postgres
2024-05-10 05:27:26,921 INFO: PostgreSQL is still starting up, 87 seconds until timeout
2024-05-10 05:27:32,038 INFO: Lock owner: DRS-INDBPG-01; I am DRS-INDBPG-01
2024-05-10 05:27:32,108 ERROR: Unexpected exception
Traceback (most recent call last):
  File "C:\PES\patroni\patroni\ha.py", line 1972, in run_cycle
    info = self._run_cycle()
           ^^^^^^^^^^^^^^^^^
  File "C:\PES\patroni\patroni\ha.py", line 1886, in _run_cycle
    msg = self.process_healthy_cluster()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\PES\patroni\patroni\ha.py", line 1427, in process_healthy_cluster
    return self.enforce_follow_remote_member(msg)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\PES\patroni\patroni\ha.py", line 821, in enforce_follow_remote_member
    return self.follow(demote_reason, message)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\PES\patroni\patroni\ha.py", line 671, in follow
    change_required, restart_required = self.state_handler.config.check_recovery_conf(node_to_follow)
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\PES\patroni\patroni\postgresql\config.py", line 797, in check_recovery_conf
    primary_conninfo = params['primary_conninfo']
                       ~~~~~~^^^^^^^^^^^^^^^^^^^^
  File "C:\PES\patroni\patroni\collections.py", line 150, in __getitem__
    return self._values[key.lower()][1]
           ~~~~~~~~~~~~^^^^^^^^^^^^^
KeyError: 'primary_conninfo'
2024-05-10 05:27:32,117 INFO: Unexpected exception raised, please report it as a BUG

PostgreSQL log files

Nothing relevant in the PostgreSQL log files.  This sometimes appears to happen more consistently when PostgreSQL cannot be connected to.

Have you tried to use GitHub issue search?

Yes

Anything else we need to know?

No response

CyberDem0n · 2024-05-16T07:19:38Z

Something is really strange here, according to the stack trace this node is running in a standby cluster (enforce_follow_remote_member), but there is no standby_cluster section in patronictl show-config.

CyberDem0n · 2024-05-16T08:24:22Z

@einhverfr please provide Patroni logs from the moment it started and also postgres logs from exactly this moment.

einhverfr · 2024-05-17T07:02:17Z

mea culpa!

I sent the show-config from the wrong cluster!

C:\PES\patroni>patronictl.py -c patroni.yaml show-config
loop_wait: 10
maximum_lag_on_failover: 1048576
postgresql:
  parameters:
    recovery_target_timeline: latest
    restore_command: ssh barman@10.3.1.41 barman get-wal pg %f > %p
  use_pg_rewind: true
  use_slots: true
retry_timeout: 10
slots:
  barman:
    type: physical
standby_cluster:
  create_replica_methods:
  - basebackup
  host: 10.1.1.10
  port: 5432
  primary_slot_name: patroni_dr
ttl: 30

einhverfr · 2024-05-17T07:03:13Z

(working on getting the logs)

einhverfr · 2024-05-17T07:54:20Z

2024-05-16 04:10:10,618 INFO: Selected new etcd server[redacted]
2024-05-16 04:10:10,676 INFO: No PostgreSQL configuration items changed, nothing to reload.
2024-05-16 04:10:10,727 WARNING: Postgresql is not running.
2024-05-16 04:10:10,727 INFO: Lock owner: DRS-INDBPG-02; I am DRS-INDBPG-01
2024-05-16 04:10:10,727 INFO: pg_controldata:
pg_control version number: 1300
Catalog version number: 202209061
Database system identifier: 7187710002575783044
Database cluster state: shut down in recovery
pg_control last modified: 5/16/2024 4:09:35 AM
Latest checkpoint location: 411/44A004B0
Latest checkpoint's REDO location: 411/4376D968
Latest checkpoint's REDO WAL file: 000000080000041100000043
Latest checkpoint's TimeLineID: 8
Latest checkpoint's PrevTimeLineID: 8
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:173222935
Latest checkpoint's NextOID: 178998508
Latest checkpoint's NextMultiXactId: 86257787
Latest checkpoint's NextMultiOffset: 616570244
Latest checkpoint's oldestXID: 717
Latest checkpoint's oldestXID's DB: 5
Latest checkpoint's oldestActiveXID: 173222935
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 5
Latest checkpoint's oldestCommitTsXid: 0
Latest checkpoint's newestCommitTsXid: 0
Time of latest checkpoint: 5/16/2024 3:51:14 AM
Fake LSN counter for unlogged rels: 0/3E8
Minimum recovery ending location: 411/45379078
Min recovery ending loc's timeline: 8
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: replica
wal_log_hints setting: on
max_connections setting: 1500
max_worker_processes setting: 8
max_wal_senders setting: 10
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: off
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float8 argument passing: by value
Data page checksum version: 0
Mock authentication nonce: c47523cad2b3c7486c7012c1bebadf80174e3cc83dc85586bd5d2dd2316f0cea
2024-05-16 04:10:10,727 INFO: Lock owner: DRS-INDBPG-02; I am DRS-INDBPG-01
2024-05-16 04:10:10,758 INFO: Local timeline=8 lsn=411/45379078
2024-05-16 04:10:10,965 INFO: primary_timeline=8
2024-05-16 04:10:10,965 INFO: Lock owner: DRS-INDBPG-02; I am DRS-INDBPG-01
2024-05-16 04:10:10,965 INFO: starting as a secondary
2024-05-16 04:10:11,458 INFO: postmaster pid=3012
2024-05-16 15:10:11.758 +04 [3012] LOG: redirecting log output to logging collector process
2024-05-16 15:10:11.758 +04 [3012] HINT: Future log output will appear in directory "log".
2024-05-16 04:10:13,664 INFO: Lock owner: DRS-INDBPG-02; I am DRS-INDBPG-01
2024-05-16 04:10:13,677 INFO: restarting after failure in progress
2024-05-16 04:10:18,696 INFO: Lock owner: DRS-INDBPG-02; I am DRS-INDBPG-01
2024-05-16 04:10:18,696 INFO: establishing a new patroni heartbeat connection to postgres
2024-05-16 04:10:22,917 ERROR: Unexpected exception
Traceback (most recent call last):
File "C:\PES\patroni\patroni\ha.py", line 1972, in run_cycle
info = self._run_cycle()
^^^^^^^^^^^^^^^^^
File "C:\PES\patroni\patroni\ha.py", line 1886, in _run_cycle
msg = self.process_healthy_cluster()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\PES\patroni\patroni\ha.py", line 1447, in process_healthy_cluster
return self.follow('cannot be a real primary in a standby cluster',
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\PES\patroni\patroni\ha.py", line 671, in follow
change_required, restart_required = self.state_handler.config.check_recovery_conf(node_to_follow)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\PES\patroni\patroni\postgresql\config.py", line 797, in check_recovery_conf
primary_conninfo = params['primary_conninfo']
~~~~~~^^^^^^^^^^^^^^^^^^^^
File "C:\PES\patroni\patroni\collections.py", line 150, in getitem
return self._values[key.lower()][1]
~~~~~~~~~~~~^^^^^^^^^^^^^
KeyError: 'primary_conninfo'
2024-05-16 04:10:22,918 INFO: Unexpected exception raised, please report it as a BUG
2024-05-16 04:10:23,663 INFO: Lock owner: DRS-INDBPG-02; I am DRS-INDBPG-01

CyberDem0n · 2024-05-17T12:13:12Z

@einhverfr I would also like to see postgres logs.
It seems that it started accepting connections, but returns some unexpected results on Patroni queries.
Ideally please set log_statement=all for patroni user.

einhverfr · 2024-05-27T19:42:29Z

ok will do

einhverfr added the bug label May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3.2.2 throwing unexpected exception #3066

3.2.2 throwing unexpected exception #3066

einhverfr commented May 16, 2024

CyberDem0n commented May 16, 2024

CyberDem0n commented May 16, 2024

einhverfr commented May 17, 2024

einhverfr commented May 17, 2024

einhverfr commented May 17, 2024

CyberDem0n commented May 17, 2024

einhverfr commented May 27, 2024

3.2.2 throwing unexpected exception #3066

3.2.2 throwing unexpected exception #3066

Comments

einhverfr commented May 16, 2024

What happened?

How can we reproduce it (as minimally and precisely as possible)?

What did you expect to happen?

Patroni/PostgreSQL/DCS version

Patroni configuration file

patronictl show-config

Patroni log files

PostgreSQL log files

Have you tried to use GitHub issue search?

Anything else we need to know?

CyberDem0n commented May 16, 2024

CyberDem0n commented May 16, 2024

einhverfr commented May 17, 2024

einhverfr commented May 17, 2024

einhverfr commented May 17, 2024

CyberDem0n commented May 17, 2024

einhverfr commented May 27, 2024