Improve Triaging of failing messages in Services / Components #1365
Labels
enhancement
New feature or request
epic
general
affects multiple services or domains
message oriented middleware
Problem
There are a variety of reasons why a message could fail in its consuming service. Sometimes, it is due to transient problems, or those dealing with the underlying systems involved in performing functions. In other cases, the message is incorrect or does not have the required information for downstream systems, meaning it is not likely to ever succeed.
Examples:
reject()
ornack()
for many reasons, including if the requested action does not exist. These messages will be requeued, even though they have little chance of ever succeeding.nack()
, with requeue explicitly set to true, meaning it will try to re-process this message indefinitely, and prioritized above subsequent requests.Proposal
Recently, rebound and reject queues have been implemented in the framework. We now have multiple options for dealing with a failing message:
nack()
orreject()
function to immediately requeue and prioritize messagesEach queue message handler in the framework should be reviewed, and a decision made on how to handle different types of failures. Priority should be given to ensuring that messages do not find themselves being infinitely requeued. Following these decisions and implementation, a short document describing the reasoning and an implementation guide for future usage should be written.
In some cases, we may wish to make the behavior configurable, and the different actions could be built into wrapper functions and delivered via npm, like the event-bus.
The text was updated successfully, but these errors were encountered: