Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Builds hang in RUNNING state, TLS handshake error #1506

Closed
ralfschimmel opened this issue Feb 25, 2016 · 14 comments
Closed

Builds hang in RUNNING state, TLS handshake error #1506

ralfschimmel opened this issue Feb 25, 2016 · 14 comments

Comments

@ralfschimmel
Copy link

Hi,

Since a day or two we have a lot of builds which hang in the running state.
They complete, all steps of the build/deploy finish, but then it just hangs.

The logs show these as only error;

http: TLS handshake error from 10.104.1.60:52088: EOF
http: TLS handshake error from 10.104.1.60:52089: EOF
http: TLS handshake error from 10.104.1.60:54854: EOF
http: TLS handshake error from 10.104.1.60:54855: EOF
http: TLS handshake error from 10.104.1.60:57682: EOF
http: TLS handshake error from 10.104.1.60:57683: EOF
http: TLS handshake error from 10.104.1.60:60444: EOF
http: TLS handshake error from 10.104.1.60:60445: EOF
http: TLS handshake error from 10.104.1.60:34929: EOF
http: TLS handshake error from 10.104.1.60:34930: EOF
http: TLS handshake error from 10.104.1.60:37695: EOF
http: TLS handshake error from 10.104.1.60:37698: EOF
http: TLS handshake error from 10.104.1.60:40437: EOF
http: TLS handshake error from 10.104.1.60:40443: EOF
http: TLS handshake error from 10.104.1.60:43158: EOF
http: TLS handshake error from 10.104.1.60:43160: EOF
http: TLS handshake error from 10.104.1.60:45923: EOF
http: TLS handshake error from 10.104.1.60:45924: EOF
http: TLS handshake error from 10.104.1.60:48645: EOF
http: TLS handshake error from 10.104.1.60:48646: EOF
http: TLS handshake error from 10.104.1.60:51422: EOF
http: TLS handshake error from 10.104.1.60:51424: EOF

Drone runs in docker on Kubernetes (GKE).
All workers run on GCE and were added with docker-machine and the drone CLI.

Please advice.

Thanks, Ralf

build:
  image: quay.io/gynzy/docker-drone-node:test
  auth_config:
    username: $$QUAY_USER
    password: $$QUAY_PASS
    email: drone@gke.nl
  commands:
    - chmod 777 ./ci/*.sh
    - bash ./ci/modify-git-checkout.sh
    - bash ./ci/build.sh
    - bash ./ci/deploy.sh

screen shot 2016-02-25 at 18 06 04

@gtaylor
Copy link

gtaylor commented Feb 25, 2016

How about a look at your k8s manifests, with the sensitive stuff masked out?

@ralfschimmel
Copy link
Author

Sure, think I borrowed it from you ;-)

apiVersion: v1
kind: ReplicationController
metadata:
  name: droneio
  labels:
    name: droneio
spec:
  replicas: 1
  selector:
    name: droneio
  template:
    metadata:
      labels:
        name: droneio
    spec:
      containers:
        - image: drone/drone:0.4
          name: droneio
          env:
            # HTTP Listen Address
            - name: SERVER_ADDR
              value: ":443"
            # HTTPS private key
            - name: SERVER_KEY
              value: /etc/secrets/key
            - name: SERVER_CERT
              value: /etc/secrets/cert
            # DB stuff
            - name: DATABASE_DRIVER
              value: sqlite3
            # This ends up being a mounted Google Cloud Disk.
            - name: DATABASE_CONFIG
              value: /var/lib/drone/drone.sqlite
            # RCS stuff
            - name: REMOTE_DRIVER
              value: github
            # TODO: Move these into secrets when supported.
            - name: REMOTE_CONFIG
              value: https://github.com?client_id=xxxx&client_secret=xxxx
          ports:
            - containerPort: 443
              protocol: TCP
          volumeMounts:
            # Contains all config
            - mountPath: /etc/secrets
              name: droneio-secrets
              readOnly: true
            # Persist our configs in an SQLite DB in here
            - mountPath: /var/lib/drone
              name: droneio-sqlite-db
            # Enables Docker in Docker
            - mountPath: /var/run/docker.sock
              name: docker-socket
            - mountPath: /var/lib/docker
              name: docker-lib
      volumes:
        - name: droneio-secrets
          secret:
            secretName: droneio
        - name: droneio-sqlite-db
          gcePersistentDisk:
            pdName: drone-04-data
            fsType: ext4
        - name: docker-socket
          hostPath:
            path: /var/run/docker.sock
        - name: docker-lib
          hostPath:
            path: /var/lib/docker
apiVersion: v1
kind: Service
metadata:
  name: droneio
  labels:
    name: droneio
spec:
  ports:
    - port: 443
      protocol: TCP
      targetPort: 443
  selector:
    name: droneio
  type: LoadBalancer
apiVersion: v1
kind: Secret
metadata:
  name: droneio
type: Opaque
data:
  key: BIGKEYBLOB
  cert: BIGKEYBLOB

@gtaylor
Copy link

gtaylor commented Feb 25, 2016

Can you verify that you can use the docker client on one of your GKE nodes to communicate with your worker VM's docker daemon? This sort of error seems like it could be a key/cert mis-match at a glance.

@bradrydzewski
Copy link

also are we sure this is a drone issue? moby/moby#18599

@ralfschimmel
Copy link
Author

@gtaylor if it can start builds on the worker nodes then it can communicate with docker right? It is when the builds is finished that it does not 'complete'.

I will clear my drone.yml to see if it can finish builds at all.

@ralfschimmel
Copy link
Author

Ok, when only executing a simple command in one of the scripts it returns green.
But when running bigger, longer running commands, it never marks as completed.
The build takes ~20 minutes.

@ralfschimmel
Copy link
Author

Tried matching up Docker versions between worker and client (1.8.3), splitting the build process in small pieces, and getting faster machines. No luck.

This message looks a lot like mine, but i'm not a gopher...
golang/go#10685

@ralfschimmel
Copy link
Author

Ok, switched to HTTP and above errors are gone.
However, the build still hangs in the running state, without further errors.

@bradrydzewski
Copy link

My recommendation is to create a simple project that can be used to repeat this error on your infrastructure. Then post to GitHub so that we can try to build it and repeat on our infrastructure. This is the first time this error has been reported, so we need a repeatable example to debug internally.

In addition I recommend trying to understand this error that is being reported in Docker to see if this is a bug with Drone or with Docker moby/moby#18599 This individual is reporting the exact same issue as you, but outside the context of Drone.

@ralfschimmel
Copy link
Author

@bradrydzewski the TLS errors are gone, but the problem remains, thus the TLS errors had to do with the combination of drone and my certificate. I'll look into the docker issue.

I will see if I can make a project which consistently displays this behaviour.

@bradrydzewski
Copy link

@ralfschimmel understood that the problem remains, but this is the first time the issue is reported, and unless we can find a way to repeat it will be very hard (and frustrating) for anyone to diagnose, debug and fix.

@ralfschimmel
Copy link
Author

👍 I'll get to it.

@bradrydzewski
Copy link

closing due to inactivity / inability to reproduce. Also improvement to 0.5 with build agents may also resolve any over-the-wire issues we have with Docker since it uses the local docker socket.

@talon-vonneudeck
Copy link

talon-vonneudeck commented Jul 19, 2016

@ralfschimmel did you ever figured out what the actual cause was?
because i have a very similar issue (outside of drone though).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants