You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After seeing messages stuck in NotVisible state at our production queues, I've set out to verify the warm shutdown mechanism works properly. My test setup:
Simple task: sleeps for a desired amount of seconds, then retries with a countdown of 10.
Max retries: 3
Single worker, threads pool, concurrency 5
What I see happens:
A task is run
It raises the retry -> putting a new message in the queue
The message is consumed by the same worker directly (message received)
During the countdown period, the message is in NotVisible state
During the countdown period, I shutdown the worker with SIGTERM
The worker successfully starts a Warm shutdown
The worker notes that it is restoring 1 unacked message
The message stays in NotVisible state
Expected behavior: the message should move to Visible state.
A possible explanation I've found:
By manually changing the code in kombu/transport/SQS.py at the _put function in the case message.get('redelivered'), adding a duplicate call to change_message_visibility - the problem seemed to go away. I've tested this against real live SQS and against Localstack. In both the problem was consistent and solved once I added the duplicate call.
To sum up, the problem might be on boto3's side and not in kombu, but I'm unsure about this.
I will be happy to assist and make a PR / provide a simple testing scenario.
Thanks
The text was updated successfully, but these errors were encountered:
Versions:
After seeing messages stuck in
NotVisible
state at our production queues, I've set out to verify the warm shutdown mechanism works properly. My test setup:What I see happens:
received
)NotVisible
stateWarm shutdown
NotVisible
stateExpected behavior: the message should move to
Visible
state.A possible explanation I've found:
By manually changing the code in
kombu/transport/SQS.py
at the_put
function in the casemessage.get('redelivered')
, adding a duplicate call tochange_message_visibility
- the problem seemed to go away. I've tested this against real live SQS and against Localstack. In both the problem was consistent and solved once I added the duplicate call.To sum up, the problem might be on boto3's side and not in kombu, but I'm unsure about this.
I will be happy to assist and make a PR / provide a simple testing scenario.
Thanks
The text was updated successfully, but these errors were encountered: