Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd(ticdc): retry on goaway to be more robust to pd restart #6798

Merged
merged 3 commits into from Aug 19, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 5 additions & 1 deletion pkg/errorutil/ignore.go
Expand Up @@ -73,12 +73,16 @@ func IsRetryableEtcdError(err error) bool {
return true
default:
}
// when the PD instance was deleted from the PD cluster, it may meet error with `raft:stopped`,
// when the PD instance was deleted from the PD cluster, it may meet different errors.
// retry on such error make cdc robust to PD / ETCD cluster member removal.
// we should tolerant such case to make cdc robust to PD / ETCD cluster member change.
// see: https://github.com/etcd-io/etcd/blob/ae36a577d7be/raft/node.go#L35
if strings.Contains(etcdErr.Error(), "raft: stopped") {
return true
}
// see: https://github.com/pingcap/tiflow/issues/6720
if strings.Contains(etcdErr.Error(), "received prior goaway: code: NO_ERROR") {
return true
}
return false
}
3 changes: 3 additions & 0 deletions pkg/errorutil/ignore_test.go
Expand Up @@ -60,6 +60,9 @@ func TestIsRetryableEtcdError(t *testing.T) {
{v3rpc.ErrTimeoutDueToLeaderFail, true},
{v3rpc.ErrNoSpace, true},
{raft.ErrStopped, true},
{errors.New("rpc error: code = Unavailable desc = closing transport due to: " +
"connection error: desc = \\\"error reading from server: EOF\\\", " +
"received prior goaway: code: NO_ERROR\""), true},
}

for _, item := range cases {
Expand Down