Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suse_ha: state gets stuck on quorum warning #19

Open
tacerus opened this issue May 17, 2023 · 3 comments
Open

suse_ha: state gets stuck on quorum warning #19

tacerus opened this issue May 17, 2023 · 3 comments
Labels
bug Something isn't working suse_ha-formula Everything related to the suse_ha formula

Comments

@tacerus
Copy link
Member

tacerus commented May 17, 2023

If the cluster is lacking quorum (i.e. only one node is active), a state.apply suse_ha gets stuck forever during the ha_add_node_utilization_primitive run. Inspecting the process list reveals the culprit to be a stuck crm process, killing it reveals:

----------
          ID: ha_add_node_utilization_primitive
    Function: cmd.run
        Name: crm configure primitive p-node-utilization ocf:pacemaker:NodeUtilization op start timeout=90 interval=0 op stop timeout=100 interval=0 op monitor timeout=20s interval=60s meta targe
t-role=Started
      Result: False
     Comment: Command "crm configure primitive p-node-utilization ocf:pacemaker:NodeUtilization op start timeout=90 interval=0 op stop timeout=100 interval=0 op monitor timeout=20s interval=60s m
eta target-role=Started" run
     Started: 14:57:35.486354
    Duration: 3748496.266 ms
     Changes:
              ----------
              pid:
                  172035
              retcode:
                  -15
              stderr:
                  ?[31mERROR?[0m: (unpack_resources)    error: Resource start-up disabled since no STONITH resources have been defined
                  ?[31mERROR?[0m: (unpack_resources)    error: Either configure some or disable STONITH with the stonith-enabled option
                  ?[31mERROR?[0m: (unpack_resources)    error: NOTE: Clusters with shared data need STONITH to ensure data integrity
                  ?[31mERROR?[0m: crm_verify: Errors found during check: config not valid
              stdout:
                  ?[33mWARNING?[0m: (cluster_status)    warning: Fencing and resource management disabled due to lack of quorum
                  Do you still want to commit (y/n)? Do you still want to commit (y/n)? Do you still want to commit (y/n)? Do you still want to commit (y/n)?

A way to have it automatically answer with "n" or an alternative non-interactive configuration call needs to be implemented.

@tacerus tacerus added bug Something isn't working suse_ha-formula Everything related to the suse_ha formula labels May 17, 2023
@cboltz
Copy link
Member

cboltz commented May 20, 2023

echo n | crm configure ... - or yes n | crm configure ... if you potentially need more than one n.

That said - a more sane handling in crm would be better.

@tacerus
Copy link
Member Author

tacerus commented May 20, 2023

That's a neat idea, thank you! I'll use that if I don't find a native way.

@tacerus
Copy link
Member Author

tacerus commented Jun 19, 2023

During my testing for #26, I did not face this issue, despite applying on a clean cluster. I will have to investigate some more if there are any conditions in the Salt logic which could trigger this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working suse_ha-formula Everything related to the suse_ha formula
Projects
None yet
Development

No branches or pull requests

2 participants