producer: connection backoff #96

mreiferson · 2014-11-22T18:11:10Z

In #95 we documented a case where nsq_to_nsq (which uses both Consumer and Producer) would panic due to a bug in the exit timeout code path.

The bug is exacerbated by the fact that in rare cases the producer could block serially for 1s per PUB if the remote address is black holing the connection attempts.

An alternative strategy might be to back off connection attempts and return errors instantly during backoff windows.

Thoughts?

The text was updated successfully, but these errors were encountered:

twmb · 2015-03-23T04:01:12Z

This sounds good. For the case of nsq_to_nsq, the first error returned would also backoff the consumer side, which would be decent. The only thing that I don't like about returning error immediately for all messages is that message.Attempts would be incremented for all messages in flight, when, in reality, only the first message was truly attempted.

mreiferson · 2015-03-23T22:18:01Z

This gets back to my comment on nsqio/nsq#380 about the semantics of attempt - I would argue it is a failed attempt if you're in a backoff window!

twmb · 2015-03-24T00:46:46Z

I like to think of it as "oh hey, while consuming I got in a bad state, I'm going to set my RDY to 0 and also not even look at anything that was in flight".

Also, with current behavior, a nsq_to_nsq would only attempt, say, 8 before the backoff window is over, meaning only 8 have a higher attempt. If we fast track returned errors in backoff windows, then up to MaxInflight messages would get an attempt incremented, which is different behavior, which could mean potentially thousands of messages are one step closer to death.

mreiferson · 2015-03-24T05:32:33Z

For cases where you want max attempts, I don't think considering this edge case as an attempt to be a practical concern.

For other cases (where you want "infinite" attempts) it doesn't matter anyway.

twmb · 2015-03-24T05:47:17Z

I suppose that's true, which means this issue can go forward, but I still think that NoAttempt would be a useful addition :).

mreiferson added the enhancement label Nov 22, 2014

twmb mentioned this issue Mar 23, 2015

nsqd: REQ without altering attempts nsqio/nsq#380

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

producer: connection backoff #96

producer: connection backoff #96

mreiferson commented Nov 22, 2014

twmb commented Mar 23, 2015

mreiferson commented Mar 23, 2015

twmb commented Mar 24, 2015

mreiferson commented Mar 24, 2015

twmb commented Mar 24, 2015

producer: connection backoff #96

producer: connection backoff #96

Comments

mreiferson commented Nov 22, 2014

twmb commented Mar 23, 2015

mreiferson commented Mar 23, 2015

twmb commented Mar 24, 2015

mreiferson commented Mar 24, 2015

twmb commented Mar 24, 2015