Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

corosync parser error: could not parse node id in corosync-quorumtool output: could not find Node ID line #188

Open
jamesyu558 opened this issue Apr 8, 2021 · 23 comments

Comments

@jamesyu558
Copy link

jamesyu558 commented Apr 8, 2021

Hi Support,

The following corosync parser error on the "Node ID" exists on the v1.2.0. So I upgraded the ha_cluster_exporter from v1.2.0 to the latest version v.1.2.1 on my RHEL7 VM. But unfortunately, this error still exists on v1.2.1.

The error message is and noticed that the field name complained by corosync is "Node ID":
msg="'corosync' collector scrape failed: corosync parser error: could not parse node id in corosync-quorumtool output: could not find Node ID line"

See below:

# corosync-quorumtool
Quorum information
------------------
Date:             Thu Apr  8 09:55:31 2021
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          2
Ring ID:          1/568
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           1
Flags:            2Node Quorate WaitForAll

Membership information
----------------------
    Nodeid      Votes Name
         1          1 XXXXXXXXXX
         2          1 XXXXXXXXXX (local)

Can you please help?
@stefanotorresi
Copy link
Member

I'm not able to reproduce this issue: your example matches the regex we're using to parse quorumtool output.
What's the output of ha_cluster_exporter --version?

@jamesyu558
Copy link
Author

jamesyu558 commented Apr 8, 2021

Here it is:

cd /var/lib/pacemaker_exporter/

ls -l

total 18436
-rwxr-xr-x. 1 postgres postgres 9437184 Apr 6 08:37 ha_cluster_exporter-amd64

./ha_cluster_exporter-amd64 --version

version 1.2.1+git.1606912430.4fceb77
built with go1.15.5 linux/amd64 2020-12-02T17:30:26+00:00

@jamesyu558
Copy link
Author

IF you have a debug module, I should be able to install it and see exactly what happened to this parser error. Please let me know if more information you need from me.. Really appreciate your help!!!

@jamesyu558
Copy link
Author

in my environment, I have pacemaker installed as well, together with this prometheus exporter installed for Grafana...

@stefanotorresi
Copy link
Member

Nope, we don't have a debug module.
I guess the best shot you have is to download the sources and run it with a step debugger to inspect what input is being actually fed to the regex here:

func parseNodeId(quorumToolOutput []byte) (string, error) {
nodeRe := regexp.MustCompile(`(?m)Node ID:\s+(\w+)`)
matches := nodeRe.FindSubmatch(quorumToolOutput)
if matches == nil {
return "", errors.New("could not find Node ID line")
}
return string(matches[1]), nil
}

Btw, what corosync version you're using?

@jamesyu558
Copy link
Author

hold on let me check

@jamesyu558
Copy link
Author

jamesyu558 commented Apr 8, 2021

corosync -v

Corosync Cluster Engine, version '2.4.3'
Copyright (c) 2006-2009 Red Hat, Inc.

@jamesyu558
Copy link
Author

How exactly to debug this on RHEL7? Do you have a specific steps to set it up?

@jamesyu558
Copy link
Author

Or modify the source code to print out the variable "quorumToolOutput" from "parseNodeId" when it gets called?

@stefanotorresi
Copy link
Member

You could clone the project and then use https://github.com/go-delve/delve to debug it, but that assumes some familiarity with the Go language and toolkit!

@jamesyu558
Copy link
Author

Thanks...I can figure this out. I let you know soon what value of "quorumToolOutput" is passed over to this function....Thank you again.

@stefanotorresi
Copy link
Member

Or modify the source code to print out the variable "quorumToolOutput" from "parseNodeId" when it gets called?

yes, you could also do that by adding

log.Debug(string(quorumToolOutput)) 

after line 85

@jamesyu558
Copy link
Author

even better...thx

@jamesyu558
Copy link
Author

Will get back to you tomorrow morning this time....

@jamesyu558
Copy link
Author

We modified that function like this:
func parseNodeId(quorumToolOutput []byte) (string, error) {
nodeRe := regexp.MustCompile((?m)Node ID:\s+(\w+))
matches := nodeRe.FindSubmatch(quorumToolOutput)
var x = string(quorumToolOutput)
if matches == nil {
return "", errors.New("could NOT find Node ID line :" + x)
}
return string(matches[1]), nil
}

Then in the log, we see this:
could not parse node id in corosync-quorumtool output: could NOT find Node ID line :"

Notice that we changed "not" to "NOT" in purpose and see if the code can take out changes.... Looks like the x variable is an empty space....

Any more ideas?

@borisjacquot
Copy link

Hello, is there any update about this issue?

@stefanotorresi
Copy link
Member

I need an example output from corosync-quorumtool to reproduce the issue. That is, an output that doesn't correctly match the (?m)Node ID:\s+(\w+) regular expression. You can verify that yourself at https://regex101.com/r/riyToT/1.
As you can see, the example provided by OP matches correctly, so I don't know what's up there.

Until I get an actual example, there is not much I can do.

@adr1enb
Copy link

adr1enb commented May 2, 2023

Hello @stefanotorresi i've the same issue, here is the output :

ha_cluster_exporter time="2023-05-02T18:37:54Z" level=warning msg="Corosync Collector scrape failed: could not parse ring id and seq number in corosync-quorumtool output: could not find Ring ID line"

Quorum information
------------------
Date:             Tue May  2 18:38:03 2023
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          2
Ring ID:          2.4a46b
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           1  
Flags:            2Node Quorate LastManStanding 

Membership information
----------------------
    Nodeid      Votes Name
         2          1 lb-int01.xxx.yyy.zzz (local)
         3          1 lb-int02.xxx.yyy.zzz

Issue on Debian 11

@stefanotorresi
Copy link
Member

stefanotorresi commented May 3, 2023

hmm, ok, that does match the regex, so it's not helping me either: https://regex101.com/r/JuhDCK/1

@stefanotorresi
Copy link
Member

oh, by the way, please always report the versions of the exporter and corosync you're using.

@adr1enb
Copy link

adr1enb commented May 10, 2023

Here it is :

corosync 3.1.2-2
ha_cluster_exporter-1.0.1

I've just updated to 1.3.2, it seems fixed 🤔

@Frazew
Copy link

Frazew commented Oct 27, 2023

tl;dr: if that can help anyone, make sure you test running corosync-quorumtool with same user as the one your ha_cluster_exporter process runs under and that it does work indeed under that user.


./ha-cluster-exporter --version
ha_cluster_exporter, version 1.3.3+git.1683650163.1000ba6 (branch: HEAD, revision: 1000ba696a5ef85737f70808a12e5a01bee5c281)
  build user:       runner@fv-az1100-952
  build date:       20230529-08:55:18
  go version:       go1.20.4
  platform:         linux/amd64
  tags:             netgo
$ corosync-quorumtool
Cannot initialize CMAP service

In this case (unprivileged user) and I guess in other cases, corosync-quorumtool exits with exit code 1 which is ignored as per this comment. stdout is empty hence the failure to find a node ID and stderr contains that error. The fix here was to make sure the user has the proper permissions for corosync-quorumtool not to fail.

I guess a possible improvement would be ignoring the return code as is currently done but also failing when stdout is empty and stderr is not, since that might indicate failure of the command itself?

@stefanotorresi
Copy link
Member

stefanotorresi commented Feb 19, 2024

failing when stdout is empty and stderr is not, since that might indicate failure of the command itself

That's a good suggestion! We'll see to implement this tweak in the next iteration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants