Skip to content
This repository has been archived by the owner on Nov 30, 2021. It is now read-only.

Deploying helloworld dockerfile example not working on digitalocean cluster #2438

Closed
loungerider opened this issue Nov 9, 2014 · 18 comments
Closed

Comments

@loungerider
Copy link

I believe a change made over the last few days has caused an issue with dockerfile deployment with Deis. The same deploy worked on Thursday night. I've been testing out Deis, removing and recreating the cluster. This morning I cloned a new deis repo and followed the cluster install as documented. The cluster comes up, all services are running, I can register and git push deis master "works", the docker image build runs, pushes to the private registry and then tries to launch. The launch fails, it takes a long time to respond and ends with an error.

To ssh://git@deis.poc.ottemo.io:2222/feline-duckling.git
 ! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'ssh://git@deis.poc.ottemo.io:2222/feline-duckling.git'

The coreos cluster instances show the container trying to activate while toggling between start-pre and auto-restart. Noticed this when deploying my app. Removed the digitalocean droplets recreated the cluster and I was able to reproduce the issue. I'm using the Deis digitalocean cloud-config and the helloworld dockerfile example.

Let me know if you need more log info.

core@deis-1 ~ $ fleetctl list-units
UNIT                    MACHINE             ACTIVE      SUB
deis-builder.service            f3812d3d.../10.132.187.45   active      running
deis-cache.service          114f773e.../10.132.236.205  active      running
deis-controller.service         75b48154.../10.132.187.44   active      running
deis-database.service           f3812d3d.../10.132.187.45   active      running
deis-logger.service         75b48154.../10.132.187.44   active      running
deis-logspout.service           114f773e.../10.132.236.205  active      running
deis-logspout.service           75b48154.../10.132.187.44   active      running
deis-logspout.service           f3812d3d.../10.132.187.45   active      running
deis-publisher.service          114f773e.../10.132.236.205  active      running
deis-publisher.service          75b48154.../10.132.187.44   active      running
deis-publisher.service          f3812d3d.../10.132.187.45   active      running
deis-registry.service           114f773e.../10.132.236.205  active      running
deis-router@1.service           114f773e.../10.132.236.205  active      running
deis-router@2.service           75b48154.../10.132.187.44   active      running
deis-router@3.service           f3812d3d.../10.132.187.45   active      running
deis-store-daemon.service       114f773e.../10.132.236.205  active      running
deis-store-daemon.service       75b48154.../10.132.187.44   active      running
deis-store-daemon.service       f3812d3d.../10.132.187.45   active      running
deis-store-gateway.service      114f773e.../10.132.236.205  active      running
deis-store-metadata.service     114f773e.../10.132.236.205  active      running
deis-store-metadata.service     75b48154.../10.132.187.44   active      running
deis-store-metadata.service     f3812d3d.../10.132.187.45   active      running
deis-store-monitor.service      114f773e.../10.132.236.205  active      running
deis-store-monitor.service      75b48154.../10.132.187.44   active      running
deis-store-monitor.service      f3812d3d.../10.132.187.45   active      running
deis-store-volume.service       114f773e.../10.132.236.205  active      running
deis-store-volume.service       75b48154.../10.132.187.44   active      running
deis-store-volume.service       f3812d3d.../10.132.187.45   active      running
feline-duckling_v2.cmd.1.service    75b48154.../10.132.187.44   activating  auto-restart

deisctl journal controller

Nov 09 02:48:13 deis-2 sh[2264]: deis-controller running...
Nov 09 02:48:13 deis-2 sh[2264]: [2014-11-09 02:48:13 +0000] [155] [INFO] Starting gunicorn 19.1.1
Nov 09 02:48:13 deis-2 sh[2264]: [2014-11-09 02:48:13 +0000] [155] [INFO] Listening at: http://0.0.0.0:8000 (155)
Nov 09 02:48:13 deis-2 sh[2264]: [2014-11-09 02:48:13 +0000] [155] [INFO] Using worker: sync
Nov 09 02:48:13 deis-2 sh[2264]: [2014-11-09 02:48:13 +0000] [161] [INFO] Booting worker with pid: 161
Nov 09 02:48:13 deis-2 sh[2264]: [2014-11-09 02:48:13 +0000] [162] [INFO] Booting worker with pid: 162
Nov 09 02:48:13 deis-2 sh[2264]: [2014-11-09 02:48:13 +0000] [163] [INFO] Booting worker with pid: 163
Nov 09 02:48:13 deis-2 sh[2264]: [2014-11-09 02:48:13 +0000] [164] [INFO] Booting worker with pid: 164
Nov 09 02:48:13 deis-2 sh[2264]: [2014-11-09 02:48:13 +0000] [165] [INFO] Booting worker with pid: 165
Nov 09 02:48:13 deis-2 sh[2264]: [2014-11-09 02:48:13 +0000] [166] [INFO] Booting worker with pid: 166
Nov 09 02:48:13 deis-2 sh[2264]: [2014-11-09 02:48:13 +0000] [167] [INFO] Booting worker with pid: 167
Nov 09 02:48:13 deis-2 sh[2264]: [2014-11-09 02:48:13 +0000] [168] [INFO] Booting worker with pid: 168
Nov 09 03:02:32 deis-2 sh[2264]: 172.17.42.1 "POST /v1/auth/login/ HTTP/1.0" 400 - "python-requests/2.4.3 CPython/2.7.6 Darwin/14.0.0"
Nov 09 03:02:59 deis-2 sh[2264]: 10.132.187.45 "POST /v1/auth/login/ HTTP/1.0" 400 - "python-requests/2.4.3 CPython/2.7.6 Darwin/14.0.0"
Nov 09 03:03:57 deis-2 sh[2264]: 172.17.42.1 "POST /v1/auth/register HTTP/1.0" 201 - "python-requests/2.4.3 CPython/2.7.6 Darwin/14.0.0"
Nov 09 03:03:58 deis-2 sh[2264]: 172.17.42.1 "POST /v1/auth/login/ HTTP/1.0" 200 - "python-requests/2.4.3 CPython/2.7.6 Darwin/14.0.0"
Nov 09 03:04:27 deis-2 sh[2264]: 10.132.236.205 "POST /v1/keys HTTP/1.0" 201 - "python-requests/2.4.3 CPython/2.7.6 Darwin/14.0.0"
Nov 09 03:06:50 deis-2 sh[2264]: 10.132.187.45 "POST /v1/auth/login/ HTTP/1.0" 200 - "python-requests/2.4.3 CPython/2.7.6 Darwin/14.0.0"
Nov 09 03:08:40 deis-2 sh[2264]: INFO feline-duckling: config feline-duckling-e52c5fb updated
Nov 09 03:08:41 deis-2 sh[2264]: INFO feline-duckling: release feline-duckling-v1 created
Nov 09 03:08:41 deis-2 sh[2264]: 172.17.42.1 "POST /v1/apps HTTP/1.0" 201 - "python-requests/2.4.3 CPython/2.7.6 Darwin/14.0.0"
Nov 09 03:08:52 deis-2 sh[2264]: 10.132.187.45 "POST /v1/hooks/push HTTP/1.1" 201 - "curl/7.35.0"
Nov 09 03:08:53 deis-2 sh[2264]: 10.132.187.45 "POST /v1/hooks/config HTTP/1.1" 200 - "curl/7.35.0"
Nov 09 03:12:03 deis-2 sh[2264]: INFO feline-duckling: build feline-duckling-d901cbf created
Nov 09 03:12:03 deis-2 sh[2264]: INFO feline-duckling: release feline-duckling-v2 created
Nov 09 03:12:04 deis-2 sh[2264]: INFO feline-duckling: ottemo scaled containers cmd=1
ov 09 03:32:03 deis-2 sh[2264]: [2014-11-09 03:32:03 +0000] [155] [CRITICAL] WORKER TIMEOUT (pid:168)
Nov 09 03:32:04 deis-2 sh[2264]: [2014-11-09 03:32:04 +0000] [7020] [INFO] Booting worker with pid: 7020
@bacongobbler
Copy link
Member

@loungerider what's the output of fleetctl status feline-duckling_v2.cmd.1.service?

@loungerider
Copy link
Author

core@deis-1 ~ $ fleetctl status feline-duckling_v2.cmd.1.service
Error running remote command: SSH_AUTH_SOCK environment variable is not set. Verify ssh-agent is running. See https://github.com/coreos/fleet/blob/master/Documentation/using-the-client.md for help.

@bacongobbler
Copy link
Member

@loungerider if you're inside one of the boxes, you need to make sure you forward your key to the SSH agent. fleetctl is telling you that it doesn't know how to SSH into the other box in your cluster.

Open a shell on a machine and forward the authentication agent connection:
        fleetctl ssh --forward-agent 2444264c-eac2-4eff-a490-32d5e5e4af24

If you ssh'd manually into the box via something like ssh core@deis-1.domain.com, you can accomplish this via ssh -A ...

@loungerider
Copy link
Author

Got it, logged into deis-2 and ran the command.

core@deis-2 ~ $ fleetctl status feline-duckling_v2.cmd.1.service
● feline-duckling_v2.cmd.1.service - feline-duckling_v2.cmd.1
Loaded: loaded (/run/fleet/units/feline-duckling_v2.cmd.1.service; linked-runtime)
Active: activating (auto-restart) (Result: exit-code) since Sun 2014-11-09 04:05:35 UTC; 4s ago
Process: 25457 ExecStartPre=/bin/sh -c IMAGE=$(etcdctl get /deis/registry/host 2>&1):$(etcdctl get /deis/registry/port 2>&1)/feline-duckling:v2; docker pull $IMAGE (code=exited, status=1/FAILURE)

Nov 09 04:05:35 deis-2 systemd[1]: Failed to start feline-duckling_v2.cmd.1.
Nov 09 04:05:35 deis-2 systemd[1]: Unit feline-duckling_v2.cmd.1.service entered failed state.

@loungerider
Copy link
Author

@bacongobbler

core@deis-2 ~ $ etcdctl get /deis/registry/host
10.132.236.205
core@deis-2 ~ $ etcdctl get /deis/registry/port
5000
core@deis-2 ~ $ docker pull 10.132.236.205:5000/feline-duckling:v2
2014/11/09 04:20:40 Error: Invalid registry endpoint https://10.132.236.205:5000/v1/: Get https://10.132.236.205:5000/v1/_ping: EOF. If this private registry supports only HTTP or HTTPS with an unknown CA certificate, please add --insecure-registry 10.132.236.205:5000 to the daemon's arguments. In the case of HTTPS, if you have access to the registry's CA certificate, no need for the flag; simply place the CA certificate at /etc/docker/certs.d/10.132.236.205:5000/ca.crt

@bacongobbler
Copy link
Member

@loungerider what version of CoreOS is this on? Could you also please post docker --version? We're working with docker right now as v1.3.1 brought a breaking change with registry authentication due to some CVEs. See https://groups.google.com/forum/#!msg/docker-announce/aQoVmQlcE0A/smPuBNYf8VwJ

@loungerider
Copy link
Author

@bacongobbler see below. Yes, probably related.

core@deis-2 ~ $ cat /etc/os-release
NAME=CoreOS
ID=coreos
VERSION=493.0.0
VERSION_ID=493.0.0
BUILD_ID=
PRETTY_NAME="CoreOS 493.0.0"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"

core@deis-2 ~ $ docker --version
Docker version 1.3.1, build 4e9bbfa

@loungerider
Copy link
Author

@bacongobbler moby/moby#8887 - definitely a known issue. Looks like updating the Deis registry service will probably make the most sense.

@bacongobbler
Copy link
Member

I just ran into this -- turns out that DigitalOcean is still on 493.0.0. I opened up a ticket with them on this and will update when I get more information

@bacongobbler
Copy link
Member

response from DigitalOcean:

Hello,

Thank you for contacting Digital Ocean Support.

It does appear that we are working on that new image. It should be up sometime in the future. Unfortunately we do not have an ETA as of this moment :)

We appreciate you being a Digital Ocean customer and please let us know if we can be of further assistance!

Regards,

Daniel 
DigitalOcean Support 
Check out our community for helpful articles and tutorials. 
https://digitalocean.com/community

@loungerider
Copy link
Author

@bacongobbler I just opened a ticket as well, hopefully they push 494.0.0 with the docker revert to the alpha channel soon. It's obvious the 1.3.1 implementation was a fail. What's the Deis team thoughts on changing the registry service? Can I use the coreos stable channel in the interim?

@bacongobbler
Copy link
Member

@loungerider in the interim, you can update to v494 via http://docs.deis.io/en/latest/managing_deis/upgrading-deis/#upgrading-coreos. We want to get to the beta channel but we require a minimum of CoreOS v471 for ceph fs kernel fixes. Docker fixes are coming upstream

@loungerider
Copy link
Author

Thanks, I'm good until the channel is updated.

@loungerider
Copy link
Author

@bacongobbler A couple of things I noticed after upgrading to 494.0.0. Not a show stopper.

I still get this error on the every git push deis master from the helloworld example or from the app I'm trying to deploy.

 ! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'ssh://git@deis.poc.ottemo.io:2222/loving-radiator.git'

When running deis start platform I get the following errors:

deis-store-daemon.service: activating/start-pre
INFO client.go:278: Failed getting response from http://127.0.0.1:4001/: cancelled
timeout reached
INFO client.go:278: Failed getting response from http://127.0.0.1:4001/: EOF
ERROR client.go:200: Unable to get result for {Set /_coreos.com/fleet/job/deis-store-metadata.service/target-state}, retrying in 100ms
deis-builder.service: activating/start-pre
INFO client.go:278: Failed getting response from http://127.0.0.1:4001/: cancelled
timeout reached

I can ctrl c and run deis start platform again and all services start eventually and are active.

@carmstrong
Copy link
Contributor

@loungerider Those are errors that bubble up from fleet. That typically means that fleet is getting overloaded on one of your machines. If you journalctl -fu fleet, I'm betting you see heartbeat timeouts.

@bacongobbler
Copy link
Member

DigitalOcean has updated to 494 now, so we should be good again. We'll keep tracking the docker 1.3.1 issue in #2395. If you have any other issues that are not related to DigitalOcean provisioning v493.0.0, please feel free to open another issue. Thanks!

@mboersma
Copy link
Member

I just provisioned a new cluster on DigitalOcean with Deis v1.0.0. I deployed example-dockerfile-python and everything seems happy: http://casual-ziggurat.mattboersma.com/

@loungerider
Copy link
Author

👍

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants