Proposal: Containers should not be considered started until the TCP ports they expose are open #7445

bfirsh · 2014-08-06T18:58:51Z

Consider the following situation: you are booting up a database container and then a web container which links to the database container. The database container may take a little time to start up, but you don't want to start the web container until the database has started.

A similar situation is where you are starting up a web server and want to wait until it has started before adding it to the load balancer.

Current solutions typically involve an entrypoint script that waits for the port to be open (see docker/compose#374 or aanand/docker-wait, or even just a fixed /bin/sleep command.

I propose that a container should not be considered started or be able to be linked to until the exposed TCP ports are ready to accept connections. The docker run -d or docker start commands should also not return until they are ready. This will allow you to safely do things like:

$ docker run -d --name db postgres
$ docker run -d -l db:db mywebapp

I am not fully familiar with the internals of container state, so I'm not sure how this should be implemented. I expect it'll be something like a Ready attribute on a container's state, perhaps with a modification to the /container/(id)/wait endpoint that will wait for a container to be ready.

Backwards compatibility

Existing containers may not open exposed ports. Perhaps there could be a timeout, or it could be an opt-in feature in a next version, switching to default behaviour in a version after that.

Future

This could be expanded into some kind of health check system. E.g. don't consider a container started and ready to be linked to until it is responding to HTTP requests with code 200.

The text was updated successfully, but these errors were encountered:

pnasrat · 2014-08-06T19:24:04Z

I could see this being more generalized to containers should support health checking. Of which port check is just one, eg you might want a custom URL to hit for eg an app server which may come up quickly but take some time initializing backends.

vieux · 2014-08-11T19:02:01Z

+1 I'ld love to see some sort of health check system

duglin · 2014-09-22T15:33:09Z

Couple of things come to mind with this....
1 - its hard to know when a container should be considered ready simply based on ports. As you mentioned, Docker really doesn't know if anything will EVER be listening on an exposed port. Perhaps, they expose a port because at some point in the future the app will dynamically open up a port for some reason but during normal operations, nothing is listening on the port. Think of a dynamically created debug port/listener. And just sleeping/timing-out might annoy people.

2 - dependencies between components is tricky but in general I think its better to assume that people will need to write code to deal with failure instead of relying on order/timing dependencies. For example, in the cases where an app needs to wait for its DB to be ready, I would claim the app needs to be written in such a way that it will recover from cases where the DB goes away at times. If they handled this situation nicely then there's no issue when the DB isn't started before the app because the app will deal with it automatically.

So, overall, I'm not sure Docker can (or should) do a lot for people here. The health check system thing might be interesting though, I'm just not sure it can be based on listening to ports - or at least w/o the admin specifically asking for it check certain ports - like @pnasrat was suggesting.

gdm85 · 2014-12-29T13:13:16Z

In my opinion this is in the realm of integration tests; I also have a use case for this (e.g. run specific tests when container is starting).

However I would prefer that problem is framed in a more generic "script that container will run to assert it's healthy", so that could be plugged in and allow to pass a red/green light to the daemon.

SvenDowideit · 2014-12-30T05:17:35Z

I find myself imagining an out of process Docker plugin which can poll (in the above image's specific fashion) which returns starting, started, degraded, broken, stopped - and the monitoring system could call periodically (or is set to docker run --restart=always)

@bfirsh what do you think? important-service:server might have a monitoring image called important-service:monitoring who's logs will contain the info, and those can be polled by the monitoring system..

so rather than the daemon specific value of started, there's an application service state too

I guess its just as easy to use a different health_check entrypoint, and to add image metadata to tell it what that is..

mjsalinger · 2015-02-16T15:16:41Z

+1 on this. This would be extremely helpful functionality.

lwcolton · 2015-04-02T13:48:07Z

+1

LightGuard · 2015-04-11T01:19:03Z

👍

MitchK · 2015-04-24T09:56:00Z

👍

wansong · 2015-05-01T15:29:01Z

+1, i also need this feature.

tuscland · 2015-05-04T13:31:16Z

+1

nikicat · 2015-05-05T09:09:19Z

👍

jlopezr · 2015-05-05T10:50:47Z

+1

wernight · 2015-05-11T08:13:33Z

👍

thefallentree · 2015-05-11T09:17:47Z

What does it mean for "TCP port" to be "open"? do you also need healthcheck to return TRUE? What if it is not HTTP based protocol?

All in all, IMHO: implement this in docker is in the wrong layer of abstraction 。 What you actually need is a service manager.

artem-sidorenko · 2015-05-11T11:06:21Z

Agree with @thefallentree , if you need such behaviour, it should be either implemented in the application itself or via some scripts which are invoked via CMD and hang in some kind of waiting loop till the according health check of connection is true.

lwcolton · 2015-05-12T22:02:18Z

A TCP port being open means it responds to TCP packets, generally to test
this you would send a packet with the SYN flag back but that's not the only
way. It has nothing to do with HTTP.

On Mon, May 11, 2015 at 6:07 AM, Artem Sidorenko notifications@github.com
wrote:

Agree with @thefallentree https://github.com/thefallentree , if you
need such behaviour, it should be either implemented in the application
itself or via some scripts which are invoked via CMD and hang in some kind
of waiting loop till the according health check of connection is true.

—
Reply to this email directly or view it on GitHub
#7445 (comment).

Sincerely,

Colton Leekley-Winslow
651-242-9559
lwcolton@gmail.com

artem-sidorenko · 2015-05-13T05:59:00Z

But what about UDP? What about applications, which open the socket very fast, but start to provide a service a bit later? (I'm pretty sure somebody will have such case)

What we talk here about is more or less a functionality of an orchestrator which manages software deployments:

start SW deployment
wait till SW is deployed and services are started
continue with next steps (e.g. next SW deployment on the next instance)

We have a bit another picture with docker&microservices: you instantiate containers from image(s) and have some kind of initialization phase (which takes a while, otherwise we wouldn't have this discussion). But you don't have this classical SW deployment here, as its usually part of building the images. So this issue is about to have similar behaviour within docker, it might be a feature for fig/compose (docker/compose#374) (with open questions regarding reliable health checks not only on the TCP layer, but higher layers as well) , which implements the abstraction layer around hypervisor. but IMHO not for hypervisor itself.

I deal with this problem like this (simple TCP check):
Dockerfile:

...
ENV START_CMD="some daemon"
ENV MYSQL_HOST="mysql"
ENV MYSQL_PORT="3306"
...
CMD /etc/docker/start
...

/etc/docker/start:

#!/bin/sh
# Wait for database to get available

MYSQL_LOOPS="10"

#wait for mysql
i=0
while ! nc $MYSQL_HOST $MYSQL_PORT >/dev/null 2>&1 < /dev/null; do
  i=`expr $i + 1`
  if [ $i -ge $MYSQL_LOOPS ]; then
    echo "$(date) - ${MYSQL_HOST}:${MYSQL_PORT} still not reachable, giving up"
    exit 1
  fi
  echo "$(date) - waiting for ${MYSQL_HOST}:${MYSQL_PORT}..."
  sleep 1
done

#continue with further steps

#start the daemon
exec $START_CMD

gdm85 · 2015-05-13T07:54:36Z

For the records, I have been using something similar to what @artem-sidorenko does, but more in the context of bringing up a cluster of containers e.g. before continuing the orchestration, make sure that service X on container Y is healthy.

zoechi · 2015-05-13T12:08:17Z

No idea if this can be done, but I think it would be nice getting an event through the GET /events endpoint when a TCP port is being bound to, with the need to opt-in for each port when a container is created.

lwcolton · 2015-05-13T12:51:20Z

Having compose allow you to define conditions that must be met before that
service is considered started would be cool. It does sounds like docker
itself may be the wrong place for this, should we create an issue in
compose?

On Wed, May 13, 2015 at 7:09 AM, Günter Zöchbauer notifications@github.com
wrote:

No idea if this can be done, but I think it would be nice getting an event
through the GET /events endpoint when a TCP port is being bound to, with
the need to opt-in for each port when a container is created.

—
Reply to this email directly or view it on GitHub
#7445 (comment).

Sincerely,

Colton Leekley-Winslow
651-242-9559
lwcolton@gmail.com

tiborvass · 2015-05-14T21:17:50Z

Review with @crosbymichael @LK4D4 @duglin @diogomonica @calavera @cpuguy83 @vishh @ewindisch

It feels like this functionality should not be Docker's responsibility: there will never be enough use cases to support (TCP, UDP, why not HTTP?) and it would bring more technical complexity, when really it should be the caller's responsibility to make sure the container is "ready" as opposed to "started". This would rather be orchestration tools' responsibility.

Health checks are a great use case, but again, there seems to be consensus on the fact that they are the orchestration tools' responsibility. Ping @mavenugo what do you think?

Also there was an argument in favor, stating that docker users and docker image authors are separate people, and although one could expect from image authors to better handle failure to establish a handshake, one cannot expect that from docker users that simply want to run existing applications that aren't as failsafe. The counterargument is that there is no way to make failsafe, an application that is not. If a container is considered "started" when its tcp endpoint is listening, just so that all another application has to do is docker wait on a container, then if for some reason the container is not even able to create the tcp socket, the application that's docker waiting will fail anyway. So if you need network guarantees, it makes sense to decouple that from Docker and have that in a separate tool.

@aanand @bfirsh @vieux @aluzzardi @mavenugo what do you guys think? Could compose or swarm provide health checks and some guarantees?

I'm closing this, but feel free to continue the discussion. We will reconsider if needed.

ryneeverett · 2015-05-14T21:30:46Z

@tiborvass While the discussion veered, I think the original proposal was primarily for the use case of the builtin linking functionality. The protocol is therefore not arbitrary -- TCP status is what determines if a link will succeed.

bfirsh · 2015-05-21T13:30:02Z

@tiborvass I think it would enormously beneficial for the container to have a standardised way of determining its health, even if its an orchestration system which actually runs the check. This would allow monitoring systems, etc, to determine whether the container was healthy, without having to know what was inside the container.

Perhaps this could be more generic than the proposal suggested here:

FROM python:2.7
ADD . /code
WORKDIR /code
RUN pip install -r requirements.txt
CMD python app.py
EXPOSE 80
HEALTHCHECK nc -z localhost 80

I'll open a new issue about this.

diogomonica · 2015-05-21T15:41:44Z

@bfirsh I think that having a container determine it's own health status, and having a standard way to report it out might make sense.

In particular, if the only thing that dockerd does is periodically run HEALTHCHECK and expose the result in the container INFO API call, that might work.

We should absolutely make it clear that the responsibility of doing actions based on the output of that check is of an external application, not docker.

sanderboom · 2015-10-17T10:19:14Z

Great suggestion

hoIIer · 2015-10-20T06:53:26Z

@artem-sidorenko I am trying to do something similar but I have 3 separate services that each have a linked postgres db container...

should I do something like this?

contentservicedb:
  image: postgres:latest
  command: ./postgres.sh <------------- run health checks here? or in contentservice?
contentservice:
  extends:
    file: build/service-content/compose.yml
    service: web
  ports:
    - "8080"
  links:
    - contentservicedb

postgres.sh

#!/bin/sh
i=0
while ! nc "postgres" "5432" >/dev/null 2>&1 < /dev/null; do
  i=`expr $i + 1`
  if [ $i -ge 50 ]; then
    echo "$(date) - postgres:5432 still not reachable, giving up"
    exit 1
  fi
  echo "$(date) - waiting for postgres:5432..."
  sleep 1
done
echo "postgres connection established"

ps that doesnt work I get Cannot start container a0e3979200e9a9615999c9cc90d0f65f14da658bfcb46f00bfe277dc23617189: Cannot link to a non running container: /overlord_contentservicedb_1 AS /overlord_contentservice_1/contentservicedb

artem-sidorenko · 2015-10-20T07:29:22Z

@erichonkanen you should run the checks in contentservice, as contentservice should wait for DB to be ready. Besides that you should have a command to start the service itself after DB is ready

hoIIer · 2015-10-20T17:23:06Z

@artem-sidorenko when you say I should have a command that starts the service after DB is ready, do you mean just exit the loop and let compose spin it up? (thats all ive done so far w/docker-compose), or are you referring to any app-level stuff I need to run e.g. "python manage.my runserver" (this is indeed what runs inside my contentservice bootstrap.sh

artem-sidorenko · 2015-10-20T17:37:04Z

@erichonkanen I mean something what gets called after db is ready, something like

#!/bin/sh
i=0
while ! nc "postgres" "5432" >/dev/null 2>&1 < /dev/null; do
  i=`expr $i + 1`
  if [ $i -ge 50 ]; then
    echo "$(date) - postgres:5432 still not reachable, giving up"
    exit 1
  fi
  echo "$(date) - waiting for postgres:5432..."
  sleep 1
done
echo "postgres connection established"
python manage.my runserver # <-- START APPLICATION

hoIIer · 2015-10-20T18:41:23Z

@artem-sidorenko ok great thanks a bunch!

hoIIer · 2015-10-22T18:43:04Z

in case anyone else arrives here looking to find a way to ensure postgres connection is open before proceeding with a webapp.. here is what I landed on and works nicely. (Add postgresql-client to the webapp Dockerfile)

#!/bin/sh
su -c "
    while ! psql --host=webappservicedb --username=postgres > /dev/null 2>&1; do
        echo 'Waiting for webappservicedb connection with postgres...'
        sleep 1;
    done;
    echo 'Connected to postgres...';"
su -c "python manage.py migrate"
su -c "python manage.py runserver 0.0.0.0:8080"

aluzzardi · 2015-10-23T02:21:43Z

@tiborvass I think the Engine should provide a high level way of doing health checks.

This should include something like, how to check (tcp connect, http, custom command), how many retries and delay between retries.

I don't think Swarm should be doing that - every machine should be checking its own containers and Swarm could actually leverage such checks.

vikas027 · 2015-10-29T01:27:19Z

decking has an option ready for doing this.

carloscarcamo · 2015-11-13T21:04:41Z

+1

aramalipoor · 2015-11-21T13:30:16Z

👍 for HEALTHCHECK instruction.

wernight · 2015-11-23T14:22:31Z

I don't feel this should be part of Docker itself especially as ports may or not be open depending on whatever conditions. This should be part of whatever is handling the startup of containers if they are meant to be run as a web service (e.g. Kubernetes).

crosbymichael · 2015-11-23T19:19:57Z

@aluzzardi Having the runtime health check the things its running sounds like a recipe for disaster if you really think about it ;)

It would be like giving a kid a credit card and letting them reconcile the bill, ya mom, i didn't spend all the money.

Having a runtime do healthchecks creates a SPOF. Healthchecking should be a 'third party' app making sure the runtime is held accountable for what it is supposed to run.

Tails · 2015-12-25T12:52:31Z

+1

darkn3rd · 2016-04-15T22:09:40Z

👍 Sort of defeats purpose of orchestration, if containers are started before containers it depends on are ready.

bfirsh added the Proposal label Aug 6, 2014

arruda mentioned this issue Oct 24, 2014

Is there a way to delay container startup to support dependant services with a longer startup time docker/compose#374

Closed

deniszgonjanin mentioned this issue Nov 5, 2014

Nginx should wait for CKAN to start ckan/ckan-docker#3

Closed

md5 mentioned this issue Jan 28, 2015

ERROR: required extension "postgis" is not installed postgis/docker-postgis#2

Closed

jessfraz added kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny and removed kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny labels Feb 26, 2015

mavenugo added the Networking label May 11, 2015

tiborvass closed this as completed May 14, 2015

edmorley mentioned this issue Jun 20, 2015

Document suggested workaround for no connections until DB init complete docker-library/mysql#81

Closed

marcosnils mentioned this issue Jul 7, 2015

Container services influxdata/telegraf#49

Merged

hhaidar mentioned this issue Jul 28, 2015

current docker with master totally broken sdelements/lets-chat#559

Open

GjjvdBurg mentioned this issue Aug 7, 2015

Add support for deploying Autolab with Docker autolab/Autolab#489

Merged

jez mentioned this issue Aug 21, 2015

Error when trying to start Docker containers for first time autolab/Autolab#559

Closed

elgalu mentioned this issue Mar 13, 2016

Proposal - Application-defined "alive probe" #21142

Closed

Proposal: Containers should not be considered started until the TCP ports they expose are open #7445

Proposal: Containers should not be considered started until the TCP ports they expose are open #7445

Comments

bfirsh commented Aug 6, 2014

Backwards compatibility

Future

pnasrat commented Aug 6, 2014

vieux commented Aug 11, 2014

duglin commented Sep 22, 2014

gdm85 commented Dec 29, 2014

SvenDowideit commented Dec 30, 2014

mjsalinger commented Feb 16, 2015

lwcolton commented Apr 2, 2015

LightGuard commented Apr 11, 2015

MitchK commented Apr 24, 2015

wansong commented May 1, 2015

tuscland commented May 4, 2015

nikicat commented May 5, 2015

jlopezr commented May 5, 2015

wernight commented May 11, 2015

thefallentree commented May 11, 2015

artem-sidorenko commented May 11, 2015

lwcolton commented May 12, 2015

artem-sidorenko commented May 13, 2015

gdm85 commented May 13, 2015

zoechi commented May 13, 2015

lwcolton commented May 13, 2015

tiborvass commented May 14, 2015

ryneeverett commented May 14, 2015

bfirsh commented May 21, 2015

diogomonica commented May 21, 2015

sanderboom commented Oct 17, 2015

hoIIer commented Oct 20, 2015

artem-sidorenko commented Oct 20, 2015

hoIIer commented Oct 20, 2015

artem-sidorenko commented Oct 20, 2015

hoIIer commented Oct 20, 2015

hoIIer commented Oct 22, 2015

aluzzardi commented Oct 23, 2015

vikas027 commented Oct 29, 2015

carloscarcamo commented Nov 13, 2015

aramalipoor commented Nov 21, 2015

wernight commented Nov 23, 2015

crosbymichael commented Nov 23, 2015

Tails commented Dec 25, 2015

darkn3rd commented Apr 15, 2016