Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allowing publishers to not be affected by the presence of slow consumers #238

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

harold-toledo
Copy link

While using NATS Streaming I came across a weird issue. I was getting from time to time a NATSSlowConsumerException when trying to acknowledge a message. I would like to state that we are using manual acknowledgement, meaning that processing a single message could take "some time". What was weird is that I thought that by using streaming I should be shielded from this type of errors, then I decided to take a look at the code of the underlying NATS client.

What I found was that if any subscription is being categorized as slow, the following method is called: NATS.Client.Connection.processSlowConsumer(Subscription s). And this is what that method was doing in its first line:

  • lastEx = new NATSSlowConsumerException();

The problem whit that is that if the same NATS client is used in an application for publishing and subscribing, the fact that some subscription is marked as slow prevents any message from being published since the following lines are found in the internal publish method of the connection class:

if (lastEx != null)
throw lastEx;

The problem we were having was caused by the fact that acknowledging a message means publishing a message using the internal publish method.

@sixlettervariables
Copy link
Collaborator

It would probably be better to simply ignore slow consumer exceptions on the publish path. At some point you need to be alerted to the fact you're slow.

@harold-toledo
Copy link
Author

And I agree with the idea of making the subscriber aware of the fact that he is a slow consumer, meaning he has lost at least one message, the one that caused it to be considered or marked as slow.
Making the subscriber aware of that is something that is related to the subscription, the connection plays no part on that. As of today, from the subscription the processSlowConsumer method is being called on the connection just to check if opts.AsyncErrorEventHandler is defined and we need to schedule a call to it.

The problem I see with that is that if I am only interested in slow consumer notifications and only for a particular subscription, I am forced to specify opts.AsyncErrorEventHandler and will receive the notifications of all the subscriptions that are being slow. I am not saying that having that option is a bad idea, but I do not have the flexibility to only specify interest in a particular subscription. I think we could perfectly have both, and let the subscriber decide. A similar approach could be used but at the subscription level, giving the subscriber, at subscription time, the ability to register a callback to be notified whenever it is being slow.

Also, the opts.AsyncErrorEventHandler is only being called when a slow consumer is detected, then, why not to give a more meaningful name for that particular case, having a generic name like that can only cause confusion.

@ColinSullivan1 ColinSullivan1 self-assigned this Jul 23, 2019
@danielwertheim
Copy link
Collaborator

@ColinSullivan1 @watfordgnf what's left on this one? Just resolving the conflicts?

@ColinSullivan1
Copy link
Member

@harold-toledo , we're finally getting to this.

Adding a subscription error event handler is a good idea imo and should be investigated further. It'd have to be well documented that the opts.AsyncErrorEventHandler will be invoked as well, unless simply having a subscription level callback prevents the AsyncErrorEventHandler from being invoked for those errors on that particular subscription. @harold-toledo, @danielwertheim, @sixlettervariables (or anyone) wdyt?

Ignoring the slow consumer on the publish path is the best way to go. There are other errors (e.g. auth errors) we'd want to check to prevent messages being unnecessarily sent to the server.

@kleferbe
Copy link
Contributor

I've stumbled on this also. Setting lastEx = new NATSSlowConsumerException(); breaks the connection because you have no option to clear that error but to reconnect.

I had a look at the Java client. It marks the subscription as slow and counts the dropped messages on the subscription.
The subscription provides getDroppedCount() and clearDroppedCount() methods.
And the connection provides a separate slowConsumerDetected event that passes the affected subscription in its arguments.

So, why not do it the Java way?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants