Fix test flakes #275

ferglor · 2023-09-14T13:37:03Z

In this PR, we're trying to rectify test flakes in the recoverer

From testing the old code, I wasn't sure of the expected behaviour of the recoverer, specifically:

When the underlying service wrapped by the recoverer errors, should we recover and restart the underlying service, or should we log the error and stop attempting to re-run the underlying service
When the underlying service wrapped by the recoverer completes successfully (returns a nil error), presumably we want to stop the recoverer at this point?

In the previous implementation of the recoverer, if the wrapped service completed execution without error, the recoverer would block until it was explicitly closed by a client. In this version, the recoverer shuts down once the wrapped service completes without error.

Also in the previous implementation of the recoverer, if the wrapped service errored, the recoverer would do nothing and block until it was explicitly closed by a client. In this version, the recoverer tries to recover the wrapped service when the wrapped service errors.

The reason I wasn't really clear what should happen, is because the old tests we had seemed to imply similar functionality for panic and error scenarios, but the behaviour defined in the implementation didn't really align with that?

Either way, it's easy to have this recover on panic and errors, or one or the other. I've ran these new tests thousands of times locally and I'm not seeing the deadlock anymore; previously, I saw the deadlock in around 1 in 500 runs.

cl-sonarqube-production · 2023-09-14T16:19:45Z

SonarQube Quality Gate

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

97.1% Coverage
0.0% Duplication

infiloop2 · 2023-09-15T12:39:53Z

pkg/v3/service/recoverer.go

+)
+
+const (
+	panicRestartWait = 10 * time.Second


can we discuss if we really want to recover from panics instead of failing hard? Is this a pattern used in other places in core node?

ferglor added 3 commits September 14, 2023 13:30

Reimplement the recoverer, fix test flakes

6f5c6da

Clean up unused NewGenericObserver, add constructor test

d1cd0df

Please the linter

cc7428c

ferglor marked this pull request as ready for review September 15, 2023 11:05

infiloop2 reviewed Sep 15, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix test flakes #275

Fix test flakes #275

ferglor commented Sep 14, 2023 •

edited

cl-sonarqube-production bot commented Sep 14, 2023

infiloop2 Sep 15, 2023

Fix test flakes #275

Are you sure you want to change the base?

Fix test flakes #275

Conversation

ferglor commented Sep 14, 2023 • edited

cl-sonarqube-production bot commented Sep 14, 2023

infiloop2 Sep 15, 2023

Choose a reason for hiding this comment

ferglor commented Sep 14, 2023 •

edited