New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
firestore: QuerySnapshotIterator.Next()
can return iterator.Done
even if Stop()
hasn't been called
#2601
Comments
Thanks for your report! @BenWhitehead any thoughts on this? We can definitely make the proposed docs fix in the immediate term. |
@tritone I think getting the docs updated sounds good. Then we can look at investigating what code change would need to happen to correct the misinformed |
It's been reported that QuerySnapshotIterator may return iterator.Done before Stop, possibly due to EOF caused by networking issues. Add a caveat about this to the docs. Updates googleapis#2601
It's been reported that QuerySnapshotIterator may return iterator.Done before Stop, possibly due to EOF caused by networking issues. Add a caveat about this to the docs. Updates #2601 Co-authored-by: Tyler Bui-Palsulich <26876514+tbpg@users.noreply.github.com>
It's been reported that QuerySnapshotIterator may return iterator.Done before Stop, possibly due to EOF caused by networking issues. Add a caveat about this to the docs. Updates googleapis#2601 Co-authored-by: Tyler Bui-Palsulich <26876514+tbpg@users.noreply.github.com>
Hey @cpick your hypothesis is correct. I think the following portion of this is reasonable.
The current docstring for Next() does state this:
Though it seems query.go has this where document.go doesn't. Perhaps we need to ensure that it is mentioned on all watch types. Regarding a suggested change to behavior:
@jba I know this is likely out of your mind, but could you possibly share why |
The docs are due to the earlier PR I made on this issue. I haven't done any further work to dig into why or at what level this occurs. |
#2601 Upon inspection, it seems we added a disclaimer to one of the two watch docstrs.
Best I can guess, I assumed I only saw io.EOF when the underlying connection was closed legitimately. From a distance, it seems to me that a reasonable solution is for stop to indicate it was called using some other mechanism, like setting a bool. Then the code that transforms io.EOF into Done could first check that bool. |
…r locations of EOF that might be impacting how Next() states iterator.Done before end is reached, googleapis#2601
FYI: #3643 fixed a small error in how watch.stop was done. I believe this fix is tangential to this bug. I also have a branch, https://github.com/googleapis/google-cloud-go/compare/master...crwilcox:refine-EOF?expand=1#diff-fbf8020e8989d7c99881c746391dda1da4706a1ce1216d16ca1dfce4f62b9284R453, where I have marked areas we might be returning done incorrectly. Though more debugging is needed. |
I have had a watch running for ~3 weeks trying to find this issue. Unfortunately I haven't observed this error. |
While I am going to keep digging into this, as I can't manage to reproduce there isn't much we can do at this time. Please reopen (or create a new issue) if you come across this again. |
Client
Firestore
Environment
alpine:3.11
Docker image running on GKEGo Environment
Built within docker image:
golang:1.14-alpine3.11
$ go version
go version go1.14.4 linux/amd64
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build686252092=/tmp/go-build -gno-record-gcc-switches"
Code
$ grep firestore go.mod
cloud.google.com/go/firestore v1.2.0
Expected behavior
The provided
watchSnapshots()
method should never generate a "stop called" error because the documentation says:And the
watchSnapshots()
method only callsStop()
fromdefer
.Actual behavior
We see occasional instances where
iterator.Done
is returned fromQuerySnapshotIterator.Next()
.My hypothesis is something like:
Under the hood, the
firestore
package turns anyio.EOF
into aniterator.Done
.It assumes this
io.EOF
came from thewatchStream.stop()
method, but I believe they can come from underlying network issues/disconnects and the library misinterprets it.For example, perhaps
watchStream.recv()
s call towatchStream.open()
failed withio.EOF
(either in it’s call toListen()
orSend()
). The latter eventually callsgrpc.ClienltStream.SendMsg()
which does say that it can returnio.EOF
. There's some effort to retry on "non-permanent watch errors", but that's only done on errors from the underlying gRPC connection'sRecv()
, but apparently intentionally not during the initial/re open (which makes sense since if the connection can't even be established that may be a permanent networking issue).As a stopgap the documentation could be changed to indicate
iterator.Done
s can be produced due to networking issues, but I think a better overall solution would be to propagate theio.EOF
(or similar) error if a connection cannot be (re)opened so the caller can decide how to react to that situation.Screenshots
NA
Additional context
NA
The text was updated successfully, but these errors were encountered: