Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic observed when a node gets deleted #9

Open
ottoyiu opened this issue Mar 7, 2018 · 9 comments
Open

Panic observed when a node gets deleted #9

ottoyiu opened this issue Mar 7, 2018 · 9 comments

Comments

@ottoyiu
Copy link
Owner

ottoyiu commented Mar 7, 2018

A panic occurs when a node gets deleted and returns a cache.DeletedFinalStateUnknown instead of a Node.

I0305 12:49:48.849075       1 main.go:42] k8s-ec2-srcdst: v0.2.1
E0305 12:56:57.201434       1 reflector.go:205] github.com/ottoyiu/k8s-ec2-srcdst/cmd/k8s-ec2-srcdst/main.go:48: Failed to list *v1.Node: Get https://100.64.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: con$
ection refused
E0305 12:56:58.202361       1 reflector.go:205] github.com/ottoyiu/k8s-ec2-srcdst/cmd/k8s-ec2-srcdst/main.go:48: Failed to list *v1.Node: Get https://100.64.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: con$
ection refused
E0305 12:57:29.203208       1 reflector.go:205] github.com/ottoyiu/k8s-ec2-srcdst/cmd/k8s-ec2-srcdst/main.go:48: Failed to list *v1.Node: Get https://100.64.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 100.64.0.1:443: i/o timeout
E0305 12:58:00.204087       1 reflector.go:205] github.com/ottoyiu/k8s-ec2-srcdst/cmd/k8s-ec2-srcdst/main.go:48: Failed to list *v1.Node: Get https://100.64.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 100.64.0.1:443: i/o timeout
E0305 12:58:31.205268       1 reflector.go:205] github.com/ottoyiu/k8s-ec2-srcdst/cmd/k8s-ec2-srcdst/main.go:48: Failed to list *v1.Node: Get https://100.64.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 100.64.0.1:443: i/o timeout
I0305 12:58:32.427858       1 srcdst_controller.go:96] Marking node ip-10-63-163-245.us-west-2.compute.internal with SrcDstCheckDisabledAnnotation
E0305 12:58:32.448368       1 runtime.go:66] Observed a panic: &runtime.TypeAssertionError{interfaceString:"interface {}", concreteString:"cache.DeletedFinalStateUnknown", assertedString:"*v1.Node", missingMethod:""} (interface conversio$
: interface {} is cache.DeletedFinalStateUnknown, not *v1.Node)
/home/travis/gopath/src/github.com/ottoyiu/k8s-ec2-srcdst/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/home/travis/gopath/src/github.com/ottoyiu/k8s-ec2-srcdst/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/home/travis/gopath/src/github.com/ottoyiu/k8s-ec2-srcdst/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/home/travis/.gimme/versions/go1.9.linux.amd64/src/runtime/asm_amd64.s:509
/home/travis/.gimme/versions/go1.9.linux.amd64/src/runtime/panic.go:491
/home/travis/.gimme/versions/go1.9.linux.amd64/src/runtime/iface.go:172
/home/travis/gopath/src/github.com/ottoyiu/k8s-ec2-srcdst/pkg/controller/srcdst_controller.go:64
/home/travis/gopath/src/github.com/ottoyiu/k8s-ec2-srcdst/pkg/controller/srcdst_controller.go:51
/home/travis/gopath/src/github.com/ottoyiu/k8s-ec2-srcdst/vendor/k8s.io/client-go/tools/cache/controller.go:209
<autogenerated>:1
/home/travis/gopath/src/github.com/ottoyiu/k8s-ec2-srcdst/vendor/k8s.io/client-go/tools/cache/controller.go:320
/home/travis/gopath/src/github.com/ottoyiu/k8s-ec2-srcdst/vendor/k8s.io/client-go/tools/cache/delta_fifo.go:451
/home/travis/gopath/src/github.com/ottoyiu/k8s-ec2-srcdst/vendor/k8s.io/client-go/tools/cache/controller.go:150
/home/travis/gopath/src/github.com/ottoyiu/k8s-ec2-srcdst/vendor/k8s.io/client-go/tools/cache/controller.go:124
/home/travis/gopath/src/github.com/ottoyiu/k8s-ec2-srcdst/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/home/travis/gopath/src/github.com/ottoyiu/k8s-ec2-srcdst/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/home/travis/gopath/src/github.com/ottoyiu/k8s-ec2-srcdst/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/home/travis/gopath/src/github.com/ottoyiu/k8s-ec2-srcdst/vendor/k8s.io/client-go/tools/cache/controller.go:124
/home/travis/gopath/src/github.com/ottoyiu/k8s-ec2-srcdst/cmd/k8s-ec2-srcdst/main.go:48
/home/travis/.gimme/versions/go1.9.linux.amd64/src/runtime/proc.go:185
/home/travis/.gimme/versions/go1.9.linux.amd64/src/runtime/asm_amd64.s:2337

a rewrite is in-order with the new style of writing these Controllers in client-go...

Related to: kubernetes/kops#4466

@blakebarnett
Copy link
Contributor

I'm seeing this quite frequently also, is there a good readinessProbe we can define for now?

ottoyiu added a commit that referenced this issue Mar 28, 2018
Fixes issue reported in #9 regarding object not being a Node object.
Removed DeleteFunc callback as it is not needed.
@ottoyiu
Copy link
Owner Author

ottoyiu commented Mar 28, 2018

@blakebarnett I'm still trying to find some time out of my schedule to rewrite this controller, which will include health checks and metrics (ie. sync duration).

For now, I created a quick patch that should solve the immediate problem with these panics:
#11

I'm not going to be able to test this until early next week; can you give this patch a try and see if this will alleviate your issues?

@ottoyiu
Copy link
Owner Author

ottoyiu commented Mar 28, 2018

@blakebarnett Forgot to link the built image:

The docker image is, for that branch is:
ottoyiu/k8s-ec2-srcdst:cast-panic-patch

https://hub.docker.com/r/ottoyiu/k8s-ec2-srcdst/tags/

@blakebarnett
Copy link
Contributor

I'll do some testing with it, thanks!

@blakebarnett
Copy link
Contributor

Have you approached the calico team about adding this to calico-kube-controllers? It would solve the problem of not being able to do a conditional deploy on upgrades via kops also...

@blakebarnett
Copy link
Contributor

My testing looks good btw...

@ottoyiu
Copy link
Owner Author

ottoyiu commented Apr 3, 2018

@blakebarnett I merged the patch; going to roll a release.

I have not approached the calico team about this but it sounds like a good idea perhaps to put this logic in kube-controllers if they're ok with the idea of having cloud specific implementation details.

@blakebarnett
Copy link
Contributor

great timing! I'm upgrading our prod cluster tonight.

@so0k
Copy link

so0k commented May 9, 2018

guess this needs to be upstreamed - https://sourcegraph.com/github.com/kubernetes/kops@release-1.9/-/blob/upup/models/cloudup/resources/addons/networking.projectcalico.org/k8s-1.7.yaml.template#L515:47

I'll open a PR if nobody else (but it's going to take me few days to get around to that)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants