Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

producer: connection backoff #96

Open
mreiferson opened this issue Nov 22, 2014 · 5 comments
Open

producer: connection backoff #96

mreiferson opened this issue Nov 22, 2014 · 5 comments

Comments

@mreiferson
Copy link
Member

In #95 we documented a case where nsq_to_nsq (which uses both Consumer and Producer) would panic due to a bug in the exit timeout code path.

The bug is exacerbated by the fact that in rare cases the producer could block serially for 1s per PUB if the remote address is black holing the connection attempts.

An alternative strategy might be to back off connection attempts and return errors instantly during backoff windows.

Thoughts?

@twmb
Copy link
Contributor

twmb commented Mar 23, 2015

This sounds good. For the case of nsq_to_nsq, the first error returned would also backoff the consumer side, which would be decent. The only thing that I don't like about returning error immediately for all messages is that message.Attempts would be incremented for all messages in flight, when, in reality, only the first message was truly attempted.

@mreiferson
Copy link
Member Author

This gets back to my comment on nsqio/nsq#380 about the semantics of attempt - I would argue it is a failed attempt if you're in a backoff window!

@twmb
Copy link
Contributor

twmb commented Mar 24, 2015

I like to think of it as "oh hey, while consuming I got in a bad state, I'm going to set my RDY to 0 and also not even look at anything that was in flight".

Also, with current behavior, a nsq_to_nsq would only attempt, say, 8 before the backoff window is over, meaning only 8 have a higher attempt. If we fast track returned errors in backoff windows, then up to MaxInflight messages would get an attempt incremented, which is different behavior, which could mean potentially thousands of messages are one step closer to death.

@mreiferson
Copy link
Member Author

For cases where you want max attempts, I don't think considering this edge case as an attempt to be a practical concern.

For other cases (where you want "infinite" attempts) it doesn't matter anyway.

@twmb
Copy link
Contributor

twmb commented Mar 24, 2015

I suppose that's true, which means this issue can go forward, but I still think that NoAttempt would be a useful addition :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants