Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error collecting status: MQGET: MQCC = MQCC_FAILED [2] MQRC = MQRC_NO_MSG_AVAILABLE [2033] #246

Open
stefanbates opened this issue Aug 19, 2023 · 2 comments

Comments

@stefanbates
Copy link

  • mq-metrics-samples version(s) that are affected by this issue.
    v5.2.5

I'm using IBM MQ 9.2.0.0 and AWS CloudWatch metrics collection stops at random with:

time="2023-08-19T12:30:27Z" level=trace msg="> [getMessage]"
time="2023-08-19T12:30:27Z" level=trace msg="> [getMessageWithHObj]"
time="2023-08-19T12:30:27Z" level=trace msg="< [getMessageWithHObj] rp: 0 Error: nil"
time="2023-08-19T12:30:27Z" level=trace msg="< [getMessage] rp: 0 Error: nil"
time="2023-08-19T12:30:27Z" level=trace msg="> [parsePCFResponse]"
time="2023-08-19T12:30:27Z" level=trace msg="< [parsePCFResponse] rp: 0"
time="2023-08-19T12:30:27Z" level=trace msg="> [getMessage]"
time="2023-08-19T12:30:27Z" level=trace msg="> [getMessageWithHObj]"
time="2023-08-19T12:30:27Z" level=trace msg="< [getMessageWithHObj] rp: 0 Error: MQGET: MQCC = MQCC_FAILED [2] MQRC = MQRC_NO_MSG_AVAILABLE [2033]"
time="2023-08-19T12:30:27Z" level=trace msg="< [getMessage] rp: 0 Error: MQGET: MQCC = MQCC_FAILED [2] MQRC = MQRC_NO_MSG_AVAILABLE [2033]"
time="2023-08-19T12:30:27Z" level=trace msg="< [ProcessPublications] rp: 0"
time="2023-08-19T12:30:27Z" level=debug msg="Polling for object status"
time="2023-08-19T12:30:27Z" level=trace msg="> [CollectQueueManagerStatus]"
time="2023-08-19T12:30:27Z" level=trace msg="> [QueueManagerInitAttributes]"
time="2023-08-19T12:30:27Z" level=trace msg="< [QueueManagerInitAttributes] rp: 1"
time="2023-08-19T12:30:27Z" level=trace msg="> [collectQueueManagerStatus]"
time="2023-08-19T12:30:27Z" level=trace msg="> [statusClearReplyQ]"
time="2023-08-19T12:30:27Z" level=trace msg="< [statusClearReplyQ] rp: 0"
time="2023-08-19T12:30:27Z" level=trace msg="> [statusSetCommandHeaders]"
time="2023-08-19T12:30:27Z" level=trace msg="< [statusSetCommandHeaders] rp: 0"
time="2023-08-19T12:30:27Z" level=trace msg="> [statusGetReply]"
time="2023-08-19T12:30:30Z" level=trace msg="< [statusGetReply] rp: 3 Error: MQGET: MQCC = MQCC_FAILED [2] MQRC = MQRC_NO_MSG_AVAILABLE [2033]"
time="2023-08-19T12:30:30Z" level=trace msg="< [collectQueueManagerStatus] rp: 0 Error: MQGET: MQCC = MQCC_FAILED [2] MQRC = MQRC_NO_MSG_AVAILABLE [2033]"
time="2023-08-19T12:30:30Z" level=trace msg="< [CollectQueueManagerStatus] rp: 0 Error: MQGET: MQCC = MQCC_FAILED [2] MQRC = MQRC_NO_MSG_AVAILABLE [2033]"
time="2023-08-19T12:30:30Z" level=error msg="Error collecting queue manager status: MQGET: MQCC = MQCC_FAILED [2] MQRC = MQRC_NO_MSG_AVAILABLE [2033]"
time="2023-08-19T12:30:30Z" level=fatal msg="Error collecting status: MQGET: MQCC = MQCC_FAILED [2] MQRC = MQRC_NO_MSG_AVAILABLE [2033]"

I noted that another issue response suggested that this issue could be a 3 second timeout. I can see a 3 second timeout between the [statusGetReply] and [statusGetReply] rp:3.

However, the MQ server is under very little load so I don't understand why such a timeout might occur.

Once this occurs the metric collection simply stops. Is there a way to ignore MQCC_FAILED if a timeout occasionally occurs?

@ibmmqmet
Copy link
Collaborator

Upgrade to V5.5.0. That level allows the 3 second interval to be configured, and has some permitted retries around these errors.

But it might be worth trying to find out what the timeout is expiring - my guess would be that "something" is stalling the qmgr if it's not actually busy. That's the kind of thing that can happen with cloud-provided instances - perhaps moving the image transparently to another piece of hardware, simply running higher priority workloads.

@stefanbates
Copy link
Author

stefanbates commented Aug 22, 2023

Thank you very much. We'll upgrade to v5.5.0 and use the waitInterval config in the short term. Having reviewed the EC2 instance deployments it also appears the instances only have 1 vCPU so perhaps that explains the stalling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants