Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE REQ] Service Bus abandon message with custom delay #454

Open
jsquire opened this issue Nov 1, 2021 · 59 comments
Open

[FEATURE REQ] Service Bus abandon message with custom delay #454

jsquire opened this issue Nov 1, 2021 · 59 comments

Comments

@jsquire
Copy link
Member

jsquire commented Nov 1, 2021

Issue Transfer

This issue has been transferred from the Azure SDK for .NET repository, #9473.

Please be aware that @@Bnjmn83 is the author of the original issue and include them for any questions or replies.

Details

This is still a desired feature and totally makes sense in many scenarios. Is there any effort to implement this in the future?

Sometimes business logic decides that it would be good to retry a message at some latter time.
For this reason it would be very helpful, if Abandon(IDictionary<>) or similar, would be able to set ScheduledEnqueueTimeUtc.

msg.Abandon(new Dictionary<string, object>()
{
{ "ScheduledEnqueueTimeUtc", DateTime.Now.ToUniversalTime().AddMinutes(2) }
});
Right now, this does not work, because only custom properties can be manipulated this way.
I’m also happy to hear if this kind of retry can be achieved on some other easier way?
Right now, I typically set LockDuration on some reasonable retry time and avoid invoking of abandon in PeekLock mode. Another way is Deferring, but I don’t like it, because it requires me
track deferred message, which makes things more complicate then necessary.

To recap all, wouldn’t be good to have something between Deferr() and Abandon()?
For example Defer(TimeSpan) or Defer(ScheduleteTimeUtc) or Abandon(TimeSpan) or Abandn(TimeSpan)?!
Only difference would be, that in a case of Defer, property DeliveryCount wouldn’t be incremented

Original Discussion


@msftbot[bot] commented on Tue Jan 14 2020

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @jfggdl


@msftbot[bot] commented on Tue Jan 14 2020

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @jfggdl


@jsquire commented on Tue Jan 14 2020

@nemakam and @binzywu: Woudl you be so kind as to offer your thoughts?


@nemakam commented on Tue Jan 14 2020

@Bnjmn83,
This is a feature ask that we could work in the future, but we don't have an ETA right now.
As an alternate solution, you can implement this yourself on the client using the transaction feature. Essentially, complete() the message and send a new message with appropriate "scheduleTime" within the same transaction. That should behave similarly.


@axisc commented on Thu Aug 13 2020

I think @nemakam's recommendation of completing the message and sending a scheduled message is a better approach.

Service Bus (or any message/command broker) is a server/sender side cursor. When a receiver/client wants to control when the message is visible again (i.e. custom delay/retry) it must take over the cursor from the sender. This can be achieved with the below options -

  • Completing the message and then resending with a scheduled message.
  • Deferring message and receiving explicitly.

Do let me know if this approach is too cumbersome and we can revisit. If not, I can close this issue.


@mack0196 commented on Wed Mar 31 2021

If the subscription\queue has messages in there, will the scheduled message 'jump to the front of the line' at its scheduled time?


@ramya-rao-a commented on Mon Nov 01 2021

@shankarsama Please consider moving this issue to https://github.com/Azure/azure-service-bus/issues where you track feature requests for the service

@triynko
Copy link

triynko commented Feb 2, 2022

This is such a critical, overdue feature. Having 'delivery counts' is useless without this, because anytime something fails, it just retries N times in rapid succession and deadletters anyway. This extra processing just makes bad situation N times worse. We need to be able to 'update' the message properties AND (more importantly) reschedule the original message to run with exponential backup or whatever algorithm we want. We can control this by storing the original message time in the user properties collection for example, and computing next delay using the current delivery count. Using transactions to reenqueue a new message while completing the existing one is not a good option. Should not have to resend the entire message payload.
I would recommend just updating the AbandonAsync method to include overloads that accepts an updated scheduled enqueue time in addition to the updated user properties.

A scheduled message really can't 'be in line' when it's scheduled. It's just at a theoretical point in time. When that point in time elapses, the message should just 'get in line' at that point in time (end of the line). The 'delay' is the more important functionality, not the specific time. Queued messages are queued and are delayed by nature.

@brian-duffy
Copy link

+1 for this please. Not much use in DDOS'ing our own services. Exponential back off policy would be a fantastic feature to add.

@EldertGrootenboer
Copy link
Contributor

One way to accomplish this already today is to use message deferral combined with a scheduled message.
For this, you would defer the message, and place it's sequence number in a scheduled message.
When the scheduled message comes in, use the sequence number to retrieve the deferred message and process it.
Please let us know if this works for you.

@SeanFeldman
Copy link
Contributor

@EldertGrootenboer I'm interpreting the original as asking for a built-in functionality to reduce the code complexity required for something that should be a simple message disposition. Whenever a workaround that involves several operations is involved, not only that incurs an additional cost on the service level, but also cognitive tax and complexity added to codebases. Walking through the workaround, here's what needs to happen:

  1. To ensure two different operations are atomic, message deferral and scheduling have to be done in a transaction.
  2. Sending a message out requires access to a message sender. If you're in a scenario/codebase where you don't have access to the object, you either have to create a sender (expensive) or rely on luck and skip atomicity altogether. Example: Azure Functions or a custom messaging framework.
  3. When receiving a scheduled message, you now have to proactively fetch an additional message, requiring a message receiver. Again, in quite a few scenarios, this is not possible. Think Azure Functions. You don't want to create a receiver to read the deferred message for each invocation.

To sum it up, there are scenarios where abandoning with a custom delay is necessary and workarounds cannot provide the same value a feature would. I hope this helps.

@EldertGrootenboer
Copy link
Contributor

EldertGrootenboer commented Feb 25, 2022

Thank you for your feedback! Although this is not something that should be done with either Abandon or Defer, as it would change the semantics of those actions, it is something we want to look into putting on the backlog. I would like to align with you for this, to get the details for your scenario. @Bnjmn83, @triynko and @SeanFeldman could you drop me a message on egrootenboer@microsoft.com, and we can take it from there.

@SeanFeldman
Copy link
Contributor

SeanFeldman commented Apr 4, 2022

@RichardGaoF, abandoning is never about deferring. With regular abandon operation the message goes back to the queue and is available right away. With this feature, the ask is for the message to be delayed for the provided time span upon abandoning, and then become available automatically.

@RichardGaoF
Copy link

One way to accomplish this already today is to use message deferral combined with a scheduled message.

For this, you would defer the message, and place it's sequence number in a scheduled message.

When the scheduled message comes in, use the sequence number to retrieve the deferred message and process it.

Please let us know if this works for you.

Thanks @EldertGrootenboer I have just one question that seems the deferred-time must be a fixed timespan set at scheduling the message?
In other words, supposing setting the timespan as 10 minutes, does that mean
The message will be enqueued in 10 minutes(scheduled) then also be deferred 10 minutes per retrieving and checking some custom conditions by the consumer
OR
The message will be enqueued in 10 minutes(scheduled) then the consumer will not retrieve the message UNTIL some custom conditions meet (works like an event trigger mode)?

I am expecting the latter, but looks it's actually the former (only be able to set a fixed timespan for a deferred message)?

@EldertGrootenboer
Copy link
Contributor

@RichardGaoF You don't set the timespan on the deferred message, but on the scheduled message instead. The deferred message will stay on the queue until it is explicitly retrieved using it's sequence number.

The scheduled message will be enqueued after the timespan has elapsed, and will be placed at the back of the queue. Once it is picked up by a consumer, that consumer will then use the sequence number which was added to the scheduled message to retrieve the deferred message.

@RichardGaoF
Copy link

@RichardGaoF You don't set the timespan on the deferred message, but on the scheduled message instead. The deferred message will stay on the queue until it is explicitly retrieved using it's sequence number.

The scheduled message will be enqueued after the timespan has elapsed, and will be placed at the back of the queue. Once it is picked up by a consumer, that consumer will then use the sequence number which was added to the scheduled message to retrieve the deferred message.

Thank you @EldertGrootenboer . So if I implement it in the loops, the deferred message will be always inside(set aside) the queue during looping till be received, handled and completed, and each time of loop need a newly created and scheduled message with two properties. Its timespan parameter works like the loop interval and messageID should always be the sequence number of the deferred message. Correct?

@SeanFeldman
Copy link
Contributor

@RichardGaoF you’re confusing message deferral and abandoning with a time-out. With this feature you don’t need to use message’s sequence number. The message won’t change its ID or anything else besides DeliveryCount because it will be the same message. Have a look at how abandoning works and add to that a back-off time that would be added. That’s it.

@RichardGaoF
Copy link

RichardGaoF commented Apr 7, 2022

@SeanFeldman @EldertGrootenboer Thanks. Maybe I have known each concept of the peek-lock, abandon, lock expires, DLQ, TTL expire, scheduled message, deferred message ..., but there seems never an article on the Internet (including MSDN) being able to clearly describe all of them working together. Maybe there are some metaphors that they never work together, but if it does not say out, readers don't know or at least are not sure just like my current situation. Anyway, please allow me to try to describe the following typical scenario using all such concepts together.

First, don't involve the scheduled message and deferred message concepts

We have just a "general" queue. There is a TTL timeout value of the queue self which means the message will be moved to DLQ if it has not been consumed after the TTL expires.
At the peek-lock, a consumer polling requests then the queue locks and sends next message to the consumer. If the consumer cannot process this message and abandon it or the processing time exceeds the lock-timeout, queue unlocks this message to be re-visible to all consumers. Here is also a max delivery count, and the message will be moved to the DLQ too if exceeds the count.

Now let's involve the scheduled message and deferred message concepts

scheduled and deferred messages

  1. There are an to-be-scheduled message and an to-be--deferred message, and the to-be-scheduled message's ID is the to-be--deferred message's sequence number. Schedule the to-be-scheduled message with a timespan and defer the to-be--deferred message.

  2. The scheduled message will not be enqueued until arriving at the timespan.

  3. A Consumer polling requests, then the queue locks and delivers the scheduled message to the consumer. The consumer uses the scheduled message ID(just the sequence number of the deferred message) to retrieve the deferred message and TRIES to process it.

  4. Here are the QUESTIONS: If the deferred message has NOT been ready to be processed (or say 'failed to be processed'), the consumer will
    1) directly create a new scheduled message; 2) schedule/enqueue the new message; 3) Defer a new copy of the deferred message; 4) Complete the original deferred message; 5) Complete the original scheduled message ?
    OR
    1) abandon the deferred message to be visible in the queue again?

  5. If the former (4-1), the deferred message will never exceed the max delivery count to be moved to the DLQ (actually, each deferred message will be delivered one time only). Else if the latter (4-2), once the abandon times exceed the max delivery count, the deferred message will be moved to the DLQ, but there will never be new scheduled messages and new copies of the deferred message.

Which above one is the real behavior of the message deferral?

I personally prefer to the former, but not very sure because the MSDN doc locks more details and examples and this article with a example looks confused the scheduled message and deferred message.

@SeanFeldman
Copy link
Contributor

@RichardGaoF, there's no deferral for this feature. Plain and simple.
This issue is talking about the ability to abandon a message and specify a timeout.
When a message is abandoned today, it goes back to the queue and is available for processing right away if there are no other messages in the queue. What this issue is about is adding a delay to an abandoned message, so that rather than appearing immediately, it would be delayed. It's the same message. No need to create a new message, no need to defer and hold on to a message sequence number, non of that.

The delivery count and dead-lettering would continue to work exactly the same way because the message is the same message.

If this still doesn't answer your question, I suggest moving a discussion to an email.

@RichardGaoF
Copy link

@SeanFeldman I re-read whole conversations thread to understand the context of the issue more.

Yes, delaying an abandoned message to be visible again in the queue is not provided by any Azure SB OOB feature now, so the method @EldertGrootenboer recommended (message scheduling + deferral) could be understood as a workaround when no existing OOB feature could be used directly now, but with a shortage that it's just a once operation/deferral instead of a "do-deferral-while" operation. So under this once operation/deferral, just like you said, the delivery count and DLQ work normally if we abandon the deferred message in our consumer.

On the other hand, just like my current business logic faced to, a typical business scenario is continually deferring a message until some condition(s) meet(do-deferral-while), instead of deferring a message once only. Therefore, some guys implemented such do-deferral-while behavior by loop creating new scheduled message and new deferred message to re-enqueue, for example, my found one from Internet

In short, referring to my last post, if 4-1, no abandon and exceeding max delivery count at all and just loop creating new scheduled message and new deferred message to re-enqueue to realize the do-deferral-while logic, else if 4-2, after a once message deferral by using a scheduled message and a message deferral, abandon's message will be re-visible in the queue immediately.

Thanks for invitation, and I might join your emails discussion if I meet more problems when implement my business logic.

@nzthiago
Copy link
Member

nzthiago commented Jul 1, 2022

One scenario where this request from @SeanFeldman would be useful is when you have sessions enabled and want to implement a circuit breaker on top. For example, I have multiple projects that send messages to a queue, session id is the customer id, and messages for the same customer need to be processed in order. But if there's a failure in processing one of the messages for a customer, requiring some manual intervention, a separate notification/workflow can be kicked off for manual investigation (say, a product is missing and needs to be created), and then reprocessing of the can continue. Being able to Abandon with a delay would be helpful so that specific session/customer 'pauses' processing while the issue is addressed, and the in-order requirement is not broken. It would at least make the solution to that requirement simpler I suspect.

@skastberg
Copy link

A feature like @SeanFeldman proposes would definitely simplify many solutions.
I would propose to additionally have a delay on deferral to let the message go back to normal queue after delay time. This way we don't need to keep track of sequence number in all cases. Of course there are some things to think about like TTL of the message when returned to the queue.

The reason to have both is that I would like to differentiate between an exception (e.g. a resource not available) and "I want to handle this later" (e.g. ordered processing). Abandon would raise delivery count while deferr would not.

@EldertGrootenboer
Copy link
Contributor

We have put this on our backlog, thank you for everyone who gave their input. There are no implementation details or timelines to share yet, but we will update this thread as we progress.

@abhishek-msft
Copy link

It's been more than a year for any updates on this? Just checking if it is still part of the backlog?

@EldertGrootenboer
Copy link
Contributor

This is indeed in the backlog, and we are currently creating a design for this. There are no timelines yet to share, but we will update this issue when we have more information.

@triynko
Copy link

triynko commented Jan 19, 2023

So happy this is in the backlog and in design phase. Basically, when we fail to process a message, it's because of some transient error. Maybe a database is unavailable, or some async action it depends on having completed hasn't yet completed. So we abandon the message.

The problem is that it gets picked up right away, fails again, we abandon it, then we repeat this N times based on max delivery count. Because there's virtually no delay between retries, there's no time for the transient error condition to resolve itself, so we're basically DOSing our own system unnecessarily, and after N deliveries, the message deadletters anyway. All is lost.

All we want is ability to abandon a message and specify some delay (or scheduled future date) before it will be picked up by subscribed processors again, which we can compute ourselves as some exponential back-off based on the current delivery count. Semantically, introducing this delay in the Abandon call makes the most sense to us. We want to release the lock on the message, but we want a delay introduced before it gets picked up for processing again automatically.

The workaround of just rescheduling the message is a bad idea for a few reasons. A completely new message resets delivery count to zero. So we'd have to create/track our own delivery count. This also increases delivery size. We also need metadata for the retry, like 'which pieces of processing failed'. For example, we have 'subscriptions' attached to handlers (these subscribers represent downstream systems that need notified that the message has arrived), so if 2 of 3 subscribers fail to be notified about the message, we have to embed these failed subscriptions in the rescheduled message and retry processing. That's a problem because we risk increasing the original message size with this property, and risk failure to reschedule. Tracking delivery count on our own is also a bad idea, because if we fail to update our internal count and the lock times out, we lose a count. So we'd have to a SUM of the the Azure-managed DeliveryCount + our InternalDeliveryCount. So there 3 problems there.

  • Entire messages being re-sent to a queue for each processing retry, which puts extra data load on a queue where we're already struggling to meet send SLAs.
  • We also must track delivery count on our own, because re-scheduling a message creates a new message that resets the count to zero.
  • We are also risking increasing size and overflowing max message size on retry by adding these extra properties after the fact, leaving us no option but to dead letter.

Now, the workaround that EldertGrootenboer came up with to defer the original message and submit a scheduled message with just the sequence number of the original message solves most, but not all, of those problems:

  • We no longer have to re-send the message data; deferring the original message leaves the original message intact
  • The smaller scheduled message with the sequence number allows us to implement the processing delay
  • We no longer need to track delivery count on our own; the original message is intact
  • We no longer have to embed metadata in the original message on abandon; the smaller scheduled message with the sequence number only can also carry the failed subscription identifiers for the next processing retry
  • We no longer have to worry about overflowing the message size; we're not adding any new metadata to it on abandon; it's all stored in the smaller schedule message that holds only identifiers

It also crease a new problem. We now have these 'deferred' messages, which are harder to work with, plus these extra smaller scheduled messages, which artificially increases the message counts in our queues and messes with alarm thresholds. There's also risk with leaving a message deferred indefinitely if something goes wrong processing the scheduled message that holds the identifier. It's all just unnecessary complexity that wouldn't be necessary if this simple and obvious feature was implemented.

Of course ALL of this would be solved by the requested feature here. When we pick up a message and processing fails because of a transient error, we can call Abandon and just supply a delay so the message is scheduled in a way where it's not picked up by subscribed processors until after some delay, rather than immediately. (Note the use of the term 'subscribed processsors' here is different from the 'subscribers' I mentioned earlier; our 'subscribers' represent downstream systems that need notified about a message being processed).

@ilya-scale
Copy link

I have been using some workarounds for this issue, and I just figured out there is an undesired side-effect: if one is using topics/subscriptions, then sending a new message to the topic when there is a failure in processing results in that all of the subscriptions will receive it which is quite unfortunate.

I really hope that this feature will be implemented soon as it is long overdue. Is there some estimate on when we can expect a possibility to delay the message processing without re-sending?

@maxandriani
Copy link

maxandriani commented Feb 27, 2023

@ilya-scale I've done the same workaround and figured out the same side-effect. My workarount creates an "adicional" header in the new message with the name of the subscription that triggered the "deferral", so the other subcriptions look to this header and just ignore de message if it is not addressed to it.... something like wireless protocol.

@Zenuka
Copy link

Zenuka commented Mar 14, 2023

This is indeed in the backlog, and we are currently creating a design for this. There are no timelines yet to share, but we will update this issue when we have more information.

How did the designing go? Did you run into any issues? Curious for an update!

@jatinpuri-microsoft
Copy link

jatinpuri-microsoft commented Apr 13, 2023

Well put by @triynko :

The problem is that it gets picked up right away, fails again, we abandon it, then we repeat this N times based on max delivery count. Because there's virtually no delay between retries, there's no time for the transient error condition to resolve itself, so we're basically DOSing our own system unnecessarily, and after N deliveries, the message deadletters anyway. All is lost.

This is even more important with the Azure Function trigger. In case of failure, the same function is retriggered immediately.

Looking forward to a fix sooner for this :)

@wouter-b
Copy link

Well put by @triynko :

The problem is that it gets picked up right away, fails again, we abandon it, then we repeat this N times based on max delivery count. Because there's virtually no delay between retries, there's no time for the transient error condition to resolve itself, so we're basically DOSing our own system unnecessarily, and after N deliveries, the message deadletters anyway. All is lost.

This is even more important with the Azure Function trigger. In case of failure, the same function is retriggered immediately.

That's a problem we are facing right now, there are some possible workarounds like catching the exception and throwing it after a delay but that's far from ideal. Hopefully this will soon be included in the servicebus ;)

@andrewkittredge
Copy link

this will make my life easier.

@EldertGrootenboer
Copy link
Contributor

Thank you for your feedback on this item. We are currently doing active development on this feature, and expect to have more to share around its release in the next couple of months.

@kimberlyyong
Copy link

When is this feature going to be available? our business have multiple subscriptions and none of the strategy will work except this feature... I got the latest Azure.Messaging.ServiceBus nuget package 7.16.2 and still don't see this feature.

@SeanFeldman
Copy link
Contributor

@kimberlyyong, the ETA was provided here .

@kimberlyyong
Copy link

oh i did not see this message this morning lol thank you! @SeanFeldman , bummer, this sucks so much.

@SeanFeldman
Copy link
Contributor

On contrary, it's great that the feature is being worked on.

@kimberlyyong
Copy link

Why would you have an "abandon" method without specifying what time it should be retry again? this is unthoughtful design to begin with. All retry must think of a retry cadence strategy. Can you imagine Polly don't have this kind of retry policy? Also Abandon is a bad name for this method as well. Also, no way I can solve my own problem, no one can do any work around other than coming up with another service bus topic/queue to maintain their subscription retry, how ridiculous is this? Why message retry/visible time a property that user cannot modify? it's people defaulting to everything "get", instead of thinking what should be "get/set", another unthoughtful coding standard. Sorry I'm ranting but I expect better from this team.

@kimberlyyong
Copy link

Also it's been 2 years since an idea of a fix is suggested and who knows how long it's been a problem before that. Probably another 5 years.

@ultrabstrong
Copy link

@kimberlyyong While you are correct that the lack of an "abandon" feature seems like a significant oversight, it is also important to remember that a team of highly intelligent people worked hard on this and likely have a good reason for designing it the way they did. I think we can all benefit a lot from trying to understand the initial intent and working together to drive a better solution. I don't think berating a team on a public forum fosters a good culture or community for programmers. I would encourage everyone to be understanding and constructive with our comments; especially since each and every programmer out there has made similarly flawed design/implementation decisions at some point in their career.

@chrisflem
Copy link

Any updates on this ?

@ArturAdam
Copy link

+1

@adearriba
Copy link

+1

@AlexEngblom
Copy link

Waiting anxiously to rid ourselves from current custom solution which essentially generates new messages instead of actual re-delivery through abandoning.

@KimberlyPhan
Copy link

@ultrabstrong so commenting on bad design is discouraged, how are we ever going to improve anything?

@ultrabstrong
Copy link

@kimberlyyong I don't think (nor did I say) commenting on bad design should be discouraged. Being disrespectful and berating people is not a good ingredient for making progress. There are a lot of ways to respectfully suggest improvement.

@KimberlyPhan
Copy link

@ultrabstrong I did not think I was being disrespectful or berating. Maybe we grow up on a different culture, let's agree to strongly disagree.

As I said, I was just ranting and in my opinion not enough thought went in to design as I have specifically typed out the reasons why.

Do you know the product personally enough to tell that my comment of these design "unthoughtful" is untrue? I would love to know more about the behinds and details of these design decisions.

Also there are a lot of highly intelligent people and I expect more from Microsoft, (as I have typed out at the end of my comment) and everyone is highly intelligent in their own ways so calm down.

PS my partner told me the word "unthoughtful" could be view as a personal attack even though it was used on design/coding decisions. I will consider this in the future.

@KimberlyPhan
Copy link

I think just forget the "AbandonMessage" method, replace it with "DeferMessage" with a timeSpan or date time parameters, if people want do whatever Abandon did, just call Defer with 0 timeSpan / now dateTime
Or at a minimum let people change the visible time property when they Abandon the message (again this is really bad method name)

@akozmic
Copy link

akozmic commented Feb 1, 2024

At the risk of being redundant with others, I wanted to add a comment expressing my desire for this feature, but will give my own context. I was very surprised to find out that the retry logic didn't work at all how I thought it would and that even when handling the message completion myself that there was no way to Abandon the message without delay.

Our use case is to receive a message from our system that some data was added/updated/deleted and then keep an Azure AI Search index as in-sync as possible.

We already have our own retry logic built into handling transient failures with the AI Search Index so network/connectivity blips should be handled reasonably. The point of us trying to use Service Bus was to increase reliability in case connectivity (self-caused or otherwise) was lost between our "AI Search Index Updater" Azure Function and the AI Search service. If messages failed to be processed, they could be delayed and processed again later, only dead-lettering in the case of long outages, in which case we'd be alerted and could perform a re-drive.

As it stands, the message just retries immediately over and over until it hits its max delivery count and then bombs out to the deadletter queue, which somewhat negates our reason for wanting to use Service Bus to begin with. It's still better than just failing and losing the message on the initial API call but a deadletter redrive should be the exception, not the rule. I don't want Ops folks getting alerted in the middle of the night because our function has no ability to self-heal.

Now I need to reconsider whether we want to move forward with this approach and honestly its a pretty big let-down for me given how excited I was to use the feature to begin with.

@KimberlyPhan
Copy link

KimberlyPhan commented Feb 2, 2024

@akozmic not sure if this will help, but because I don't want to / cannot use the work around suggested (requeue to topics is not possible I only own one subscription of many subscriptions in that topic, Defer require some kind of persistent memory to pick up processed message I don't want to have another solution for this solution)
===> I have kind of hackly get this to work for me.

Option 1:
If you don't have and state management, you can just simply catch the exception, and don't COMPLETE the message, which SHOULD NOT (not sure 100% but will test) take up any threads, and let the message lock duration expire on it own before retry will happen... I find this behavior very strange, because the message lock duration is set to 5 minutes (max is 5 minutes), but consistently I see retry every 2.5 minutes.

Option 2:
In my case, there is state management needed in retry. For example: each work processing have to do step A, B, then C.

  1. When there is a change in the state (need to remember that you already did A, so next retry jump to step B), then I will Abandon and record the changes in the message's property. In this case the message get retried immediately.
  2. If there is no change, meaning the retry have made NO progress, I could not complete step B still, then I will simply NOT complete the message, and got retried every 2.5 minutes.
  3. If you don't want dead letter to queue up, then read retry count or message original queue time and complete the message, else let them sleep in dead letter queue.

Cons of this hack:

  • Have to live with max 2.5 minutes retry
  • No back off exponential blah blah retry.
  • If there IS progress made (yay), Abandon() will cause immediate retry (I can live with it)

@akozmic
Copy link

akozmic commented Feb 2, 2024

@KimberlyPhan thank you for the suggestions. I actually tried Option 1 already and I found if I would just catch the exception and swallow it, it didn't look like it was updating the DeliveryCount on the message. The lock on the message would expire and then retry after the designated amount of time, but with DeliveryCount=0. It really seemed like it was entering an infinite loop.

Also I do apologize, i did not see that this feature was actively being worked on from a previous message and should launch soon based on the estimate. I look forward to it.

@KimberlyPhan
Copy link

@akozmic I see, we use these below crit to put the message to DLQ, then just a matter of max retry time * 2.5 minutes.

var timeInQueue = DateTimeOffset.UtcNow.Subtract(args.Message.EnqueuedTime).TotalMinutes;
if (timeInQueue > maxTimeToProcessInMinutes)
{
await args.DeadLetterMessageAsync(message, $"TimeInQueue exceeds limit {maxTimeToProcessInMinutes}");
}

@g-stone7
Copy link

g-stone7 commented Feb 7, 2024

Is this feature going to be implemented in the java sdk too? Would highly appreciate it

@jsquire
Copy link
Member Author

jsquire commented Feb 7, 2024

Is this feature going to be implemented in the java sdk too? Would highly appreciate it

When the service adds the operation, the official Azure SDK packages will support it, including our Java libraries.

@vicqueen
Copy link

vicqueen commented Mar 4, 2024

Thank you for your feedback on this item. We are currently doing active development on this feature, and expect to have more to share around its release in the next couple of months.

@EldertGrootenboer it's been 5 months since your last update. How is this feature going, can we expect it anytime soon? 🙏🏻

@AlexEngblom
Copy link

Yup, I hope we get this soon. 🙏🏻

Quite recently had to make stability and correlation fixes to our existing customized cob-web solution and would really appreciate this as a native feature.

@EldertGrootenboer
Copy link
Contributor

We are currently doing active development on this feature, and expect to have more to share around its release in the next couple of months.

@wenleix
Copy link

wenleix commented Jun 3, 2024

Thanks @EldertGrootenboer . Would also be great to make sure to have abandon message with custom delay supported in Python Client as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests