storage: Uninformative error messages after errors cause timeout #4197

rocurley · 2021-06-02T17:54:17Z

Client

Storage

Environment

Local Ubuntu 18.04 machine

Go Environment

$ go version
go version go1.15.5 linux/amd64
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/roger.curley/.cache/go-build"
GOENV="/home/roger.curley/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/roger.curley/go/pkg/mod"
GOOS="linux"
GOPATH="/home/roger.curley/go"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build136821508=/tmp/go-build -gno-record-gcc-switches"

(Some lines deleted that reference info internal to my company. If needed I can reproduce this on a non-work machine, but I don't expect my local environment to be very relevant here)
Code

package main

import (
	"context"
	"fmt"
	"net/http"
	"time"

	"cloud.google.com/go/storage"
	"google.golang.org/api/option"
)

func main() {
	err := mainWrapped()
	if err != nil {
		fmt.Printf("%+v\n", err)
	}
}
func mainWrapped() error {
	ctx := context.Background()
	httpClient :=
		http.Client{
			Transport: transport{},
		}
	client, err := storage.NewClient(ctx, option.WithHTTPClient(&httpClient))
	if err != nil {
		return err
	}
	bucket := client.Bucket("foo")
	obj := bucket.Object("bar")
	ctx, _ = context.WithTimeout(ctx, time.Second)
	_, err = obj.Attrs(ctx)
	if err != nil {
		return err
	}
	return nil
}

type transport struct{}

func (t transport) RoundTrip(req *http.Request) (*http.Response, error) {
	fmt.Println("Returning 500 error")
	return &http.Response{
		Status:     "500 Internal Server Error",
		StatusCode: 500,
	}, nil
}

Expected behavior

It should be possible to recover the underlying error that led to the timeout.

Actual behavior

The only error information returned is that the context timed out:

Returning 500 error
Returning 500 error
context deadline exceeded

Or using spew.Dump instead of fmt.Printf:

Returning 500 error
Returning 500 error
(context.deadlineExceededError) {
}

Additional context

I care about the specific error because some retryable errors can be mitigated on my end: for example, this issue masked an issue caused by attempting to delete the same object many times at once.

I considered filing this as a feature request, but it looks like there's some attempt to do this in the code:

google-cloud-go/internal/retry.go

Lines 42 to 45 in f97bdfa

    
           // Remember the last "real" error from f. 
        
           if err != nil && err != context.Canceled && err != context.DeadlineExceeded { 
        
           	lastErr = err 
        
           }

The text was updated successfully, but these errors were encountered:

tritone · 2021-06-07T03:35:48Z

Thanks for filing this. I actually did not think we attempted to preserve the non-context error currently, so I'll have to do some research to understand what is going on there in internal/retry.go.

In any case, we're hoping to add Go 1.13 error wrapping soon, so hopefully that should be enough to resolve the issue (we'd wrap the previous error in the context error in that case).

This piggybacks on googleapis#4797 to allow storage to expose wrapped service errors when a call retries without success until timeout or cancellation. I also updated all checks for context sentinel errors in storage to use xerrors.Is instead of strict equality. Users of this package should do the same. I'll update the documentation on errors from this package in a subsequent PR. We will have to bump the dependency on the root module before merging this change I believe. Fixes googleapis#4197

This piggybacks on #4797 to allow storage to expose wrapped service errors when a call retries without success until timeout or cancellation. I also updated all checks for context sentinel errors in storage to use xerrors.Is instead of strict equality. Users of this package should do the same. I'll update the documentation on errors from this package in a subsequent PR. We will have to bump the dependency on the root module before merging this change I believe. Fixes #4197

tritone · 2021-10-06T06:00:12Z

I added wrapping for retried errors that eventually time out. Most errors will now look like this:

object.Attrs: retry failed with context deadline exceeded; last error: googleapi: got HTTP response code 503 with body: {"code":"503","message":{"error":{"message":"<message from server>"}}}

To introspect, any of the following should work:

_, err = client.Bucket(bucketName).Object(objName).Attrs(timeoutCtx)
if err != nil {
	if e, ok := err.(interface{ Unwrap() error }); ok {
		wrappedErr := e.Unwrap()  // yields underlying googleapi.Error.
	}

	// errors.As allows unwrapping with access to googleapi.Error fields.
	var errAsGoogleapi *googleapi.Error
	if errors.As(err, &errAsGoogleapi) {
		log.Printf("error code: %v", errAsGoogleapi.Code)
	}

	isDeadlineExceeded := errors.Is(err, context.DeadlineExceeded)  // use errors.Is to compare with sentinels
}

Hope this helps!

rocurley · 2021-10-06T15:12:50Z

LGTM, thanks!

rocurley added the triage me I really want to be triaged. label Jun 2, 2021

product-auto-label bot added the api: storage Issues related to the Cloud Storage API. label Jun 2, 2021

blunderbuss-gcf bot assigned tritone Jun 2, 2021

tritone added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed triage me I really want to be triaged. labels Jun 7, 2021

tritone mentioned this issue Sep 23, 2021

feat(storage): returned wrapped error for timeouts #4802

Merged

gcf-merge-on-green bot closed this as completed in #4802 Oct 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: Uninformative error messages after errors cause timeout #4197

storage: Uninformative error messages after errors cause timeout #4197

rocurley commented Jun 2, 2021 •

edited

tritone commented Jun 7, 2021

tritone commented Oct 6, 2021

rocurley commented Oct 6, 2021

storage: Uninformative error messages after errors cause timeout #4197

storage: Uninformative error messages after errors cause timeout #4197

Comments

rocurley commented Jun 2, 2021 • edited

tritone commented Jun 7, 2021

tritone commented Oct 6, 2021

rocurley commented Oct 6, 2021

rocurley commented Jun 2, 2021 •

edited