Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only Able to Reach NodePort Services Externally On Node Pod Schedules On #485

Closed
frankhinek opened this issue Apr 6, 2018 · 4 comments
Closed
Labels

Comments

@frankhinek
Copy link

frankhinek commented Apr 6, 2018

I've deployed a new K8s cluster using RKE and Calico networking. I've discovered that I am only able to access type NodePort services on the node the pod is scheduled to. If I try access the service from any other node it fails. I've confirmed that I am able to access the service while SSH'd to any one of the nodes, so the problem is only exhibited when attempting to access services from outside the cluster.

I tried asking on the #rke and #general channels on Slack but didn't get a response, so I'm posting here in hopes that it might be some bug or mistake on my part that can be identified.

RKE version:
v0.1.5-rc2

Docker version: (docker version,docker info preferred)

$ docker version
Client:
 Version:      17.03.2-ce
 API version:  1.27
 Go version:   go1.7.5
 Git commit:   f5ec1e2
 Built:        Tue Jun 27 02:21:36 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.03.2-ce
 API version:  1.27 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   f5ec1e2
 Built:        Tue Jun 27 02:21:36 2017
 OS/Arch:      linux/amd64
 Experimental: false
$ docker info
Containers: 40
 Running: 22
 Paused: 0
 Stopped: 18
Images: 15
Server Version: 17.03.2-ce
Storage Driver: overlay
 Backing Filesystem: xfs
 Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 4ab9917febca54791c5f071a9d1f404867857fcc
runc version: 54296cf40ad8143b62dbcaa1d90e520a2136ddfe
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-514.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.3 (Maipo)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.64 GiB
Name: HIC005323
ID: HF73:TUQO:M6UI:K4A5:J5O3:MOJV:7ZRR:7Q4H:N6WY:EIDX:BXAU:YNGV
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Operating system and kernel: (cat /etc/os-release, uname -r preferred)

$ cat /etc/os-release
NAME="Red Hat Enterprise Linux Server"
VERSION="7.3 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="7.3"
PRETTY_NAME="Red Hat Enterprise Linux Server 7.3 (Maipo)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.3:GA:server"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.3
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.3"
$ uname -r
3.10.0-514.el7.x86_64

Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)
virtual machine running on VMware vSphere cluster

cluster.yml file:

nodes:
- address: 10.224.88.38
  role:
  - controlplane
  - etcd
  user: frank
  docker_socket: /var/run/docker.sock

- address: 10.224.88.86
  role:
  - controlplane
  - etcd
  user: frank
  docker_socket: /var/run/docker.sock

- address: 10.224.88.30
  role:
  - controlplane
  - etcd
  user: frank
  docker_socket: /var/run/docker.sock

- address: 10.224.88.81
  role:
  - worker
  user: frank
  docker_socket: /var/run/docker.sock
  labels:
    node-role.rook.io/storage: true
    node-role.rook.io/rgw: true

- address: 10.224.88.85
  role:
  - worker
  user: frank
  docker_socket: /var/run/docker.sock
  labels:
    node-role.rook.io/storage: true
    node-role.rook.io/rgw: true

- address: 10.224.88.178
  role:
  - worker
  user: frank
  docker_socket: /var/run/docker.sock
  labels:
    node-role.rook.io/storage: true
    node-role.rook.io/rgw: true

services:
  etcd:
    image: rancher/coreos-etcd:v3.0.17
  kube-api:
    image: rancher/k8s:v1.8.10-rancher1-1
    service_cluster_ip_range: 10.43.0.0/16
    pod_security_policy: false
  kube-controller:
    image: rancher/k8s:v1.8.10-rancher1-1
    cluster_cidr: 10.42.0.0/16
    service_cluster_ip_range: 10.43.0.0/16
  scheduler:
    image: rancher/k8s:v1.8.10-rancher1-1
  kubelet:
    image: rancher/k8s:v1.8.10-rancher1-1
    cluster_domain: cluster.local
    infra_container_image: rancher/pause-amd64:3.0
    cluster_dns_server: 10.43.0.10
    fail_swap_on: false
  kubeproxy:
    image: rancher/k8s:v1.8.10-rancher1-1
network:
  plugin: calico
authentication:
  strategy: x509
#ssh_key_path: ~/.ssh/id_gpg.pub
ssh_agent_auth: true
authorization:
  mode: rbac
ignore_docker_version: false

ingress:
  provider: none

Steps to Reproduce:

  1. Deploy k8s cluster with RKE and above cluster.yml.
  2. Create simply HTTP pod and service using:
apiVersion: v1
kind: Service
metadata:
  name: httpbin
  labels:
    app: httpbin
spec:
  ports:
  - name: http
    port: 8000
  selector:
    app: httpbin
  type: NodePort
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: httpbin
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: httpbin
        version: v1
    spec:
      containers:
      - image: docker.io/citizenstig/httpbin
        imagePullPolicy: IfNotPresent
        name: httpbin
        ports:
          - containerPort: 8000
  1. Identify which node the pod is scheduled on:
$ kubectl get pods -o wide  -l app=httpbin
NAME                       READY     STATUS    RESTARTS   AGE       IP           NODE
httpbin-7d77cd44fb-mcfhs   1/1       Running   0          12m       10.42.5.15   10.224.88.178
  1. Identify which port the service is provisioned:
kubectl get svc -l app=httpbin
NAME      TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
httpbin   NodePort   10.43.158.122   <none>        8000:32745/TCP   13m
  1. Execute curl 10.224.88.178:32745/status/418 command against the node the pod is scheduled on and observe the response.
  2. Execute curl 10.224.88.85:32745/status/418 command against a different node from the one the pod is scheduled on and observe the response.
  3. SSH to the 10.224.88.85 node and execute the curl localhost:32745/status/418 command.

Results:
The response to curl against the node the pod is scheduled on is:

$ curl 10.224.88.178:32745/status/418

    -=[ teapot ]=-

       _...._
     .'  _ _ `.
    | ."` ^ `". _,
    \_;`"---"`|//
      |       ;/
      \_     _/
        `"""`

whereas for the other node it is:

$ curl 10.224.88.85:32745/status/418
curl: (7) Failed to connect to 10.224.88.81 port 32745: Connection refused

but when logged in via SSH to the 10.224.88.85 node the curl succeeds:

$ curl localhost:32745/status/418

    -=[ teapot ]=-

       _...._
     .'  _ _ `.
    | ."` ^ `". _,
    \_;`"---"`|//
      |       ;/
      \_     _/
        `"""`
@frankhinek
Copy link
Author

frankhinek commented Apr 6, 2018

I've done a bit more digging, and as best I can tell, the culprit is Docker's policy of setting the FORWARD chain policy to DROP as a fix for this security vulnerability: moby/moby#14041

I've confirmed that each time I attempt curl 10.224.88.85:32745 that the policy DROP packet counter increments:

$ sudo iptables -t filter -v --line-numbers -L FORWARD
Chain FORWARD (policy DROP 3 packets, 188 bytes)
num   pkts bytes target     prot opt in     out     source               destination
1    76887  119M cali-FORWARD  all  --  any    any     anywhere             anywhere             /* cali:wUHhoiAYhphO9Mso */
2       15   936 KUBE-FORWARD  all  --  any    any     anywhere             anywhere             /* kubernetes forward rules */
3       13   808 DOCKER-ISOLATION  all  --  any    any     anywhere             anywhere
4        0     0 DOCKER     all  --  any    docker0  anywhere             anywhere
5        0     0 ACCEPT     all  --  any    docker0  anywhere             anywhere             ctstate RELATED,ESTABLISHED
6        0     0 ACCEPT     all  --  docker0 !docker0  anywhere             anywhere
7        0     0 ACCEPT     all  --  docker0 docker0  anywhere             anywhere
7        0     0 ACCEPT     all  --  docker0 docker0  anywhere             anywhere

I found an issue opened just 2 days ago in Project Calico that confirms this is a known problem: projectcalico/calico#1840

This is an upstream bug in Kubernetes: kubernetes/kubernetes#59656

Fix waiting to be merged: kubernetes/kubernetes#62007

Since this isn't an RKE bug feel free to close, but I left the issue open for now in case anyone at Rancher wants to update docs or at least be aware if people ask on Slack.

@moelsayed
Copy link
Contributor

@frankhinek Thank you for detailed report. We will keep the issue open to and track the upstream issues until they are resolved.

@Luke035
Copy link

Luke035 commented May 11, 2018

Same problem here!

@deniseschannon
Copy link

As this was was a k8s issue tht'st now fixed. I"ll close this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants