Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(storage): try to reopen for failed Reads #4226

Merged
merged 7 commits into from Jun 23, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
12 changes: 11 additions & 1 deletion storage/reader.go
Expand Up @@ -389,7 +389,17 @@ func shouldRetryRead(err error) bool {
if err == nil {
return false
}
return strings.HasSuffix(err.Error(), "INTERNAL_ERROR") && strings.Contains(reflect.TypeOf(err).String(), "http2")
// Certain HTTP/2 errors are retryable, but the types are not exported.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What errors are not retryable here?

This path (shouldRetryRead) is only consulted when a read has returned a successful response header, but something went wrong while reading the body. Is there a reason not to unconditionally retry the read in this case? (Preferably with backoff.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of this call, "retry" means to call the reader's reopen at the updated offset (which involves a new GET request to the GCS API) and then attempt to continue reading bytes from that point.

I'm a little bit hesitant to retry unconditionally:

  1. If the error from Read is a context timeout/cancellation, we would probably want to return to the caller immediately and not fire off another request.
  2. If there is some error which could indicate problems with the data received, we would definitely want to surface that to the caller (so that they can retry the whole Read from the start of the object if desired). I'm not sure if this is actually something that can occur, though (do you have thoughts?).

I could see having a retry for any http2 error though, if you think it's safe.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want to stop retrying when the context expires/is canceled, of course. (I wouldn't test the error for this, though; test the context directly to see if it is done.)

For any other error--once you've read a response header, I don't think there is any error condition other than variations on "the stream broke". The only point where the remote end can provide a useful error is when sending the response header. I wouldn't limit retries to HTTP/2 errors, because the same applies to HTTP/1 (if someone disables HTTP/2 for some reason) or HTTP/3 (someday).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying, much appreciated!

I tried to implement this but I'm a bit stuck on what to do with the context. What I'm seeing is that we don't have access to the context via the Read method; it's passed in via NewRangeReader and then embedded in reopen(). Any thoughts on what to do with this? I guess I could have reopen() itself check for context.Done before doing any calls? What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I wrote an implementation doing what I suggested above-- does this seem right to you? I think I should write a test that cancels the context as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an integration test that makes sure cancelation actually works as expected as well (did this because I wanted to make sure there's no possibility of an infinite loop of calls to reopen at this point)

if strings.Contains(reflect.TypeOf(err).String(), "http2") {
retryableHTTP2Strings := []string{"INTERNAL_ERROR", "GOAWAY"}
e := err.Error()
for _, s := range retryableHTTP2Strings {
if strings.Contains(e, s) {
return true
}
}
}
return false
}

// Size returns the size of the object in bytes.
Expand Down
30 changes: 20 additions & 10 deletions storage/reader_test.go
Expand Up @@ -131,7 +131,8 @@ func (h http2Error) Error() string {
}

func TestRangeReaderRetry(t *testing.T) {
retryErr := http2Error("blah blah INTERNAL_ERROR")
internalErr := http2Error("blah blah INTERNAL_ERROR")
goawayErr := http2Error("http2: server sent GOAWAY and closed the connection; LastStreamID=15, ErrCode=NO_ERROR, debug=\"load_shed\"")
readBytes := []byte(readData)
hc, close := newTestServer(handleRangeRead)
defer close()
Expand Down Expand Up @@ -159,7 +160,7 @@ func TestRangeReaderRetry(t *testing.T) {
offset: 0,
length: -1,
bodies: []fakeReadCloser{
{data: readBytes, counts: []int{3}, err: retryErr},
{data: readBytes, counts: []int{3}, err: internalErr},
{data: readBytes[3:], counts: []int{5, 2}, err: io.EOF},
},
want: readData,
Expand All @@ -168,8 +169,8 @@ func TestRangeReaderRetry(t *testing.T) {
offset: 0,
length: -1,
bodies: []fakeReadCloser{
{data: readBytes, counts: []int{5}, err: retryErr},
{data: readBytes[5:], counts: []int{1, 3}, err: retryErr},
{data: readBytes, counts: []int{5}, err: internalErr},
{data: readBytes[5:], counts: []int{1, 3}, err: goawayErr},
{data: readBytes[9:], counts: []int{1}, err: io.EOF},
},
want: readData,
Expand All @@ -178,7 +179,16 @@ func TestRangeReaderRetry(t *testing.T) {
offset: 0,
length: 5,
bodies: []fakeReadCloser{
{data: readBytes, counts: []int{3}, err: retryErr},
{data: readBytes, counts: []int{3}, err: internalErr},
{data: readBytes[3:], counts: []int{2}, err: io.EOF},
},
want: readData[:5],
},
{
offset: 0,
length: 5,
bodies: []fakeReadCloser{
{data: readBytes, counts: []int{3}, err: goawayErr},
{data: readBytes[3:], counts: []int{2}, err: io.EOF},
},
want: readData[:5],
Expand All @@ -187,7 +197,7 @@ func TestRangeReaderRetry(t *testing.T) {
offset: 1,
length: 5,
bodies: []fakeReadCloser{
{data: readBytes, counts: []int{3}, err: retryErr},
{data: readBytes, counts: []int{3}, err: internalErr},
{data: readBytes[3:], counts: []int{2}, err: io.EOF},
},
want: readData[:5],
Expand All @@ -196,7 +206,7 @@ func TestRangeReaderRetry(t *testing.T) {
offset: 1,
length: 3,
bodies: []fakeReadCloser{
{data: readBytes[1:], counts: []int{1}, err: retryErr},
{data: readBytes[1:], counts: []int{1}, err: internalErr},
{data: readBytes[2:], counts: []int{2}, err: io.EOF},
},
want: readData[1:4],
Expand All @@ -205,8 +215,8 @@ func TestRangeReaderRetry(t *testing.T) {
offset: 4,
length: -1,
bodies: []fakeReadCloser{
{data: readBytes[4:], counts: []int{1}, err: retryErr},
{data: readBytes[5:], counts: []int{4}, err: retryErr},
{data: readBytes[4:], counts: []int{1}, err: internalErr},
{data: readBytes[5:], counts: []int{4}, err: internalErr},
{data: readBytes[9:], counts: []int{1}, err: io.EOF},
},
want: readData[4:],
Expand All @@ -215,7 +225,7 @@ func TestRangeReaderRetry(t *testing.T) {
offset: -4,
length: -1,
bodies: []fakeReadCloser{
{data: readBytes[6:], counts: []int{1}, err: retryErr},
{data: readBytes[6:], counts: []int{1}, err: internalErr},
{data: readBytes[7:], counts: []int{3}, err: io.EOF},
},
want: readData[6:],
Expand Down