Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add check for writeability on master #665

Open
ainlolcat opened this issue Apr 17, 2018 · 8 comments
Open

Add check for writeability on master #665

ainlolcat opened this issue Apr 17, 2018 · 8 comments

Comments

@ainlolcat
Copy link
Contributor

We have custom scripts which every 30 second checks if postgres available for RW operations and performs swichover if postgres cannot serve RW requests and disk is almost full. Patroni does not check if master available for RW operations and keeps master the same even though postgres cannot perform single insert.

So I wondering if patroni can check if postges is available for RW operations and hande some scenario (like full disk) or allow users to specify callbalck if some events occures.

In first option patroni can check if postgres available for RW operation, consider other parameters ( someone put postgres in recovery mode manually for backup or something like that) and perform switchover on healthy replica if possible.

In second option patroni can check if postgres availiable for RW operation and call callback with paramtetrs like connection to DB, occurrence count, state transition (RW>RO/RO>RW), uptime.

@jberkus jberkus changed the title Postgres state monitoring Add check for writeability on master Apr 18, 2018
@jberkus
Copy link
Contributor

jberkus commented Apr 18, 2018

So, first, do you understand why most users don't actually want this kind of check?

Second, I think we have a hook location for adding your own extra healthchecks. @CyberDem0n ?

@ainlolcat
Copy link
Contributor Author

Yes. Not everyone likes someone messing around with database with RW requests. We have custom tests for simple RW check - insert/update/select/delete in special table which pretty small but still exists. So I prefer second option there user can specify additional healthcheck/reaction if required. It will become user headache to define cases which requires failover and cases which is just maintenance. Maybe some of such checks and reactions can be included in archive with patroni but disabled by default.

@ants
Copy link
Collaborator

ants commented Apr 19, 2018

Can you describe some specific failure scenarios where master is not responding correctly and failing over would be helpful to the situation? In the full disk example, in a typical deployment all nodes have the same amount of disk, so failing over is unlikely to improve availability. More likely you will then have a full disk on all nodes and if implemented correctly a cascade of failovers across all nodes, if not, a neverending cycle of failovers.

Partial hardware failures (I/O errors or slowness, etc.) could and I think should be monitored and handled outside of Patroni. Probably by removing the whole node from the cluster. To turn a node readonly an external agent could already turn on nofailover tag of Patroni and trigger a failover.

@ainlolcat
Copy link
Contributor Author

In some cases (logging configuration error, core dumps, parts of some backup, another application error, stuck wals because of abandoned replication slot, etc) we can see full disk only on master. We do check if slave has enough free space to prevent from failover from full master to full slave.

@CyberDem0n
Copy link
Collaborator

In general, I agree with @ants, I don't really like an idea of turning Patroni into a monitoring agent which also perform some reactive feedback actions.
It is basically your responsibility to set up an extensive monitoring system, which will do lots of checks, like CPU Utilization, free space on different partitions, hardware failures, number of open and active connections and many many more.

@jberkus
Copy link
Contributor

jberkus commented May 1, 2018

@CyberDem0n we don't want this as a core feature. However, is there a hook for adding things to the availability check without forking Patroni? If not, we should add one.

@CyberDem0n
Copy link
Collaborator

@jberkus
It is not hard to implement such hook. If we get non zero exit code from the script, we will just dynamically set nofailover tag to True.

But I don't really like to have such functionality, because it will become the biggest foot-shotgun ever...

It seems that we are putting together a lot of different things here:

  • writability and how to check it?
    • select pg_is_in_recovery()?
    • CREATE TABLE/INSERT per tablespace?
  • critical conditions
    • disk (or quota) full on one or more filesystems
    • faulty hardware
    • I/O errors in the syslog

Right now on every iteration of HA loop Patroni is verifying that pg_is_in_recovery() returns False, what generally means that writes are possible. I absolutely agree that this might be not enough. For example one of the filesystems can become full, read-only or not accessible due to different reasons.

Most of the checks for the above mentioned conditions must be executed not only on the master but on all nodes in the cluster, otherwise you can really get a cascade of failovers. How are you going to run DDL/DML on replicas? Even ability to perform INSERT doesn't really gives us a guaranty that data would be persisted on disk (problems with fsync).

In wast majority of cases (except disk space issues) "failing" node must be removed from cluster and replaced with the new one. This action should be done either by some person or external tooling rather than Patroni.

And regarding diskspace issues: there could be dozens of different reasons why disk space is eaten and in my opinion none of them deserves "automated" switchover, because at the end you'll get into the same situation on the replica and have another switchover...

@ainlolcat
Copy link
Contributor Author

Today I had issue with failover - someone forgot to consume data from replication slot and postgres starts acting funny after disk has been filled. It starts and tries to enter recovery mode. After that it fails because cannot create temporary files. Patroni tried to start it 2 days untill it generate enough garbage to prevent postgres from creating lock file during start. Only after this failover occured and new master was elected. This case has two sides:

  • failover did not occur (our watchdog had tried to call manual failover but was rejected without any meaningfull info in logs) ;
  • wal files were stored on each node (strangely except one) so it can lead to cascade failover.
    I dont think such issue can advocate new failover logic so just FYI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants