consumer: FullJitterStrategy may increase backoff counter unexpectedly #195

judwhite · 2016-10-03T09:30:46Z

Lines 781 to 786 in 3c0c5ed

    
           case backoffFlag: 
        
           	nextBackoff := r.config.BackoffStrategy.Calculate(int(backoffCounter) + 1) 
        
           	if nextBackoff <= r.config.MaxBackoffDuration { 
        
           		backoffCounter++ 
        
           		backoffUpdated = true 
        
           	}

If FullJitterStrategy calculates a value less than config.MaxBackoffDuration then backoffCounter will increment where ExponentialStrategy wouldn't, potentially causing inBackoff() and RDY 1 to persist longer than intended.

This probably matters very little for most use cases, but during a prolonged outage and long-running handlers it could take a while to recover.

The text was updated successfully, but these errors were encountered:

judwhite · 2016-10-03T10:04:52Z

Ideas on how to approach this:

Add a max backoff counter which defaults to log2(cfg.MaxBackoffDuration / cfg.BackoffMultiplier). I doubt anyone wants to turn this knob and it assumes an implementation by default.
Break the signature of the BackoffStrategy interface's Calculate method to indicate if the backoff counter should increment.
Add another interface which adds this behavior, use type assertions to see if it's implemented? Requires internal knowledge and type magic.

I think I'd favor breaking the signature rather than a silent change in behavior. I'd also be surprised if many people are using custom backoff strategies. Thoughts?

mreiferson · 2016-10-05T00:45:59Z

I see what you mean.

Break the signature of the BackoffStrategy interface's Calculate method to indicate if the backoff counter should increment.

How would FullJitterStrategy determine when to return true/false in this case?

judwhite · 2016-10-05T00:50:58Z

@mreiferson Something like this

// Calculate returns a random duration of time [0, 2 ^ attempt) and a value
// indicating whether to increase the backoff counter.
func (s *FullJitterStrategy) Calculate(attempt int) (time.Duration, bool) {
    // lazily initialize the RNG
    s.rngOnce.Do(func() {
        if s.rng != nil {
            return
        }
        s.rng = rand.New(rand.NewSource(time.Now().UnixNano()))
    })

    backoffDuration := s.cfg.BackoffMultiplier *
        time.Duration(math.Pow(2, float64(attempt)))

    return time.Duration(s.rng.Intn(int(backoffDuration))),
           backoffDuration <= s.cfg.MaxBackoffDuration
}

mreiferson · 2016-10-05T00:56:15Z

Yea, that makes sense.

I think I'd favor breaking the signature rather than a silent change in behavior. I'd also be surprised if many people are using custom backoff strategies. Thoughts?

We simply don't know, although I tend to agree. The other "right" thing to do would be to bump the major version number. But given Go's pathetic dependency management situation, I don't think that's all that important. What do you think @jehiah?

judwhite · 2016-10-09T00:17:31Z

@mreiferson @jehiah I thought about the pros and cons of a MaxBackoffLevel or changing the BackoffStrategy interface some more. I think the added complexity of another option, and that a new option would overlap in meaning with existing options, makes changing the interface a better approach.

Thoughts on these two approaches:

A MaxBackoffLevel is probably something consumers care about - how many success levels do we need to get through before we're back at full throttle?
The BackoffStrategy interface takes a parameter named attempt; the way it's called with backoffCounter this value is (supposed to be) bounded (which is a good thing for full jitter, otherwise it'd often calculate higher values which would get capped at MaxBackoffDuration, and go against the purpose of full jitter). This supports that a MaxBackoffLevel is important to recovery and also backoff calculation in the strategy (to state the obvious).
Adding a MaxBackoffLevel would add complexity, and in a way overlap MaxBackoffDuration; currently MaxBackoffLevel implicitly exists (or should) by a combination of MaxBackoffDuration, BackoffMultiplier, and the strategy.

What do you think about changing the interface? Any other ideas?

If changing the interface is the way to go, we could:

Change the signature of Calculate to Calculate(attempt int) (time.Duration, bool), like above.
Add a MaxBackoffLevel() int method to the interface. That method might look something like this:

func (s *FullJitterStrategy) MaxBackoffLevel() int {
    x := float64(s.cfg.MaxBackoffDuration / s.cfg.BackoffMultiplier)
    return int(math.Ceil(math.Log2(x + 1)))
}

mreiferson · 2016-10-10T04:53:34Z

I agree with your analysis.

It's not great, and we should try to avoid it when possible, but I ultimately think that behavior of the library is more important than backwards compatibility and furthermore, given that compilation will break, end users will be notified before it can affect a production system.

Let's modify the interface. When we go to stamp a stable release and write up migration docs, we can decide about version numbers.

judwhite · 2016-10-10T07:09:07Z

@mreiferson Thanks for taking the time to read through this. I'll raise a PR to change Calculate, I think it's more understandable than a method calculating the max backoff level.

I noticed the code starts the backoff counter at 1, which means the lowest backoff will be (2^1)*cfg.BackoffMultiplier. I think passing 1 to Calculate is expected, and the strategy should multiply by 2^(n-1). There are tests for this but they start attempt at 0, which it'll never be.

It's been like this for a while but the tests are more recent.

mreiferson · 2016-10-10T17:14:59Z

SGTM re: off-by-one 😜

judwhite mentioned this issue Oct 3, 2016

Consumer: Backoff counter may increase beyond expected bounds when using FullJitterStrategy judwhite/NsqSharp#64

Closed

mreiferson added bug question labels Oct 5, 2016

mreiferson changed the title ~~Consumer: Backoff counter may increase beyond expected bounds when using FullJitterStrategy~~ consumer: FullJitterStrategy may increase backoff counter unexpectedly Oct 10, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consumer: FullJitterStrategy may increase backoff counter unexpectedly #195

consumer: FullJitterStrategy may increase backoff counter unexpectedly #195

judwhite commented Oct 3, 2016

judwhite commented Oct 3, 2016 •

edited

mreiferson commented Oct 5, 2016

judwhite commented Oct 5, 2016 •

edited

mreiferson commented Oct 5, 2016

judwhite commented Oct 9, 2016 •

edited by mreiferson

mreiferson commented Oct 10, 2016

judwhite commented Oct 10, 2016

mreiferson commented Oct 10, 2016

consumer: FullJitterStrategy may increase backoff counter unexpectedly #195

consumer: FullJitterStrategy may increase backoff counter unexpectedly #195

Comments

judwhite commented Oct 3, 2016

judwhite commented Oct 3, 2016 • edited

mreiferson commented Oct 5, 2016

judwhite commented Oct 5, 2016 • edited

mreiferson commented Oct 5, 2016

judwhite commented Oct 9, 2016 • edited by mreiferson

mreiferson commented Oct 10, 2016

judwhite commented Oct 10, 2016

mreiferson commented Oct 10, 2016

judwhite commented Oct 3, 2016 •

edited

judwhite commented Oct 5, 2016 •

edited

judwhite commented Oct 9, 2016 •

edited by mreiferson