Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Containers should not be considered started until the TCP ports they expose are open #7445

Closed
bfirsh opened this issue Aug 6, 2014 · 40 comments
Labels
kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny

Comments

@bfirsh
Copy link
Contributor

bfirsh commented Aug 6, 2014

Consider the following situation: you are booting up a database container and then a web container which links to the database container. The database container may take a little time to start up, but you don't want to start the web container until the database has started.

A similar situation is where you are starting up a web server and want to wait until it has started before adding it to the load balancer.

Current solutions typically involve an entrypoint script that waits for the port to be open (see docker/compose#374 or aanand/docker-wait, or even just a fixed /bin/sleep command.

I propose that a container should not be considered started or be able to be linked to until the exposed TCP ports are ready to accept connections. The docker run -d or docker start commands should also not return until they are ready. This will allow you to safely do things like:

$ docker run -d --name db postgres
$ docker run -d -l db:db mywebapp

I am not fully familiar with the internals of container state, so I'm not sure how this should be implemented. I expect it'll be something like a Ready attribute on a container's state, perhaps with a modification to the /container/(id)/wait endpoint that will wait for a container to be ready.

Backwards compatibility

Existing containers may not open exposed ports. Perhaps there could be a timeout, or it could be an opt-in feature in a next version, switching to default behaviour in a version after that.

Future

This could be expanded into some kind of health check system. E.g. don't consider a container started and ready to be linked to until it is responding to HTTP requests with code 200.

@bfirsh bfirsh added the Proposal label Aug 6, 2014
@pnasrat
Copy link
Contributor

pnasrat commented Aug 6, 2014

I could see this being more generalized to containers should support health checking. Of which port check is just one, eg you might want a custom URL to hit for eg an app server which may come up quickly but take some time initializing backends.

@vieux
Copy link
Contributor

vieux commented Aug 11, 2014

+1 I'ld love to see some sort of health check system

@duglin
Copy link
Contributor

duglin commented Sep 22, 2014

Couple of things come to mind with this....
1 - its hard to know when a container should be considered ready simply based on ports. As you mentioned, Docker really doesn't know if anything will EVER be listening on an exposed port. Perhaps, they expose a port because at some point in the future the app will dynamically open up a port for some reason but during normal operations, nothing is listening on the port. Think of a dynamically created debug port/listener. And just sleeping/timing-out might annoy people.

2 - dependencies between components is tricky but in general I think its better to assume that people will need to write code to deal with failure instead of relying on order/timing dependencies. For example, in the cases where an app needs to wait for its DB to be ready, I would claim the app needs to be written in such a way that it will recover from cases where the DB goes away at times. If they handled this situation nicely then there's no issue when the DB isn't started before the app because the app will deal with it automatically.

So, overall, I'm not sure Docker can (or should) do a lot for people here. The health check system thing might be interesting though, I'm just not sure it can be based on listening to ports - or at least w/o the admin specifically asking for it check certain ports - like @pnasrat was suggesting.

@gdm85
Copy link
Contributor

gdm85 commented Dec 29, 2014

In my opinion this is in the realm of integration tests; I also have a use case for this (e.g. run specific tests when container is starting).

However I would prefer that problem is framed in a more generic "script that container will run to assert it's healthy", so that could be plugged in and allow to pass a red/green light to the daemon.

@SvenDowideit
Copy link
Contributor

I find myself imagining an out of process Docker plugin which can poll (in the above image's specific fashion) which returns starting, started, degraded, broken, stopped - and the monitoring system could call periodically (or is set to docker run --restart=always)

@bfirsh what do you think? important-service:server might have a monitoring image called important-service:monitoring who's logs will contain the info, and those can be polled by the monitoring system..

so rather than the daemon specific value of started, there's an application service state too

I guess its just as easy to use a different health_check entrypoint, and to add image metadata to tell it what that is..

@mjsalinger
Copy link

+1 on this. This would be extremely helpful functionality.

@jessfraz jessfraz added kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny and removed kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny labels Feb 26, 2015
@lwcolton
Copy link

lwcolton commented Apr 2, 2015

+1

2 similar comments
@LightGuard
Copy link

👍

@MitchK
Copy link

MitchK commented Apr 24, 2015

👍

@wansong
Copy link

wansong commented May 1, 2015

+1, i also need this feature.

@tuscland
Copy link

tuscland commented May 4, 2015

+1

3 similar comments
@nikicat
Copy link

nikicat commented May 5, 2015

👍

@jlopezr
Copy link

jlopezr commented May 5, 2015

+1

@wernight
Copy link

👍

@thefallentree
Copy link

What does it mean for "TCP port" to be "open"? do you also need healthcheck to return TRUE? What if it is not HTTP based protocol?

All in all, IMHO: implement this in docker is in the wrong layer of abstraction 。 What you actually need is a service manager.

@artem-sidorenko
Copy link

Agree with @thefallentree , if you need such behaviour, it should be either implemented in the application itself or via some scripts which are invoked via CMD and hang in some kind of waiting loop till the according health check of connection is true.

@lwcolton
Copy link

A TCP port being open means it responds to TCP packets, generally to test
this you would send a packet with the SYN flag back but that's not the only
way. It has nothing to do with HTTP.

On Mon, May 11, 2015 at 6:07 AM, Artem Sidorenko notifications@github.com
wrote:

Agree with @thefallentree https://github.com/thefallentree , if you
need such behaviour, it should be either implemented in the application
itself or via some scripts which are invoked via CMD and hang in some kind
of waiting loop till the according health check of connection is true.


Reply to this email directly or view it on GitHub
#7445 (comment).

Sincerely,

Colton Leekley-Winslow
651-242-9559
lwcolton@gmail.com

@artem-sidorenko
Copy link

But what about UDP? What about applications, which open the socket very fast, but start to provide a service a bit later? (I'm pretty sure somebody will have such case)

What we talk here about is more or less a functionality of an orchestrator which manages software deployments:

  • start SW deployment
  • wait till SW is deployed and services are started
  • continue with next steps (e.g. next SW deployment on the next instance)

We have a bit another picture with docker&microservices: you instantiate containers from image(s) and have some kind of initialization phase (which takes a while, otherwise we wouldn't have this discussion). But you don't have this classical SW deployment here, as its usually part of building the images. So this issue is about to have similar behaviour within docker, it might be a feature for fig/compose (docker/compose#374) (with open questions regarding reliable health checks not only on the TCP layer, but higher layers as well) , which implements the abstraction layer around hypervisor. but IMHO not for hypervisor itself.

I deal with this problem like this (simple TCP check):
Dockerfile:

...
ENV START_CMD="some daemon"
ENV MYSQL_HOST="mysql"
ENV MYSQL_PORT="3306"
...
CMD /etc/docker/start
...

/etc/docker/start:

#!/bin/sh
# Wait for database to get available

MYSQL_LOOPS="10"

#wait for mysql
i=0
while ! nc $MYSQL_HOST $MYSQL_PORT >/dev/null 2>&1 < /dev/null; do
  i=`expr $i + 1`
  if [ $i -ge $MYSQL_LOOPS ]; then
    echo "$(date) - ${MYSQL_HOST}:${MYSQL_PORT} still not reachable, giving up"
    exit 1
  fi
  echo "$(date) - waiting for ${MYSQL_HOST}:${MYSQL_PORT}..."
  sleep 1
done

#continue with further steps

#start the daemon
exec $START_CMD

@gdm85
Copy link
Contributor

gdm85 commented May 13, 2015

For the records, I have been using something similar to what @artem-sidorenko does, but more in the context of bringing up a cluster of containers e.g. before continuing the orchestration, make sure that service X on container Y is healthy.

@zoechi
Copy link
Contributor

zoechi commented May 13, 2015

No idea if this can be done, but I think it would be nice getting an event through the GET /events endpoint when a TCP port is being bound to, with the need to opt-in for each port when a container is created.

@lwcolton
Copy link

Having compose allow you to define conditions that must be met before that
service is considered started would be cool. It does sounds like docker
itself may be the wrong place for this, should we create an issue in
compose?

On Wed, May 13, 2015 at 7:09 AM, Günter Zöchbauer notifications@github.com
wrote:

No idea if this can be done, but I think it would be nice getting an event
through the GET /events endpoint when a TCP port is being bound to, with
the need to opt-in for each port when a container is created.


Reply to this email directly or view it on GitHub
#7445 (comment).

Sincerely,

Colton Leekley-Winslow
651-242-9559
lwcolton@gmail.com

@tiborvass
Copy link
Contributor

Review with @crosbymichael @LK4D4 @duglin @diogomonica @calavera @cpuguy83 @vishh @ewindisch

It feels like this functionality should not be Docker's responsibility: there will never be enough use cases to support (TCP, UDP, why not HTTP?) and it would bring more technical complexity, when really it should be the caller's responsibility to make sure the container is "ready" as opposed to "started". This would rather be orchestration tools' responsibility.

Health checks are a great use case, but again, there seems to be consensus on the fact that they are the orchestration tools' responsibility. Ping @mavenugo what do you think?

Also there was an argument in favor, stating that docker users and docker image authors are separate people, and although one could expect from image authors to better handle failure to establish a handshake, one cannot expect that from docker users that simply want to run existing applications that aren't as failsafe. The counterargument is that there is no way to make failsafe, an application that is not. If a container is considered "started" when its tcp endpoint is listening, just so that all another application has to do is docker wait on a container, then if for some reason the container is not even able to create the tcp socket, the application that's docker waiting will fail anyway. So if you need network guarantees, it makes sense to decouple that from Docker and have that in a separate tool.

@aanand @bfirsh @vieux @aluzzardi @mavenugo what do you guys think? Could compose or swarm provide health checks and some guarantees?

I'm closing this, but feel free to continue the discussion. We will reconsider if needed.

@ryneeverett
Copy link

@tiborvass While the discussion veered, I think the original proposal was primarily for the use case of the builtin linking functionality. The protocol is therefore not arbitrary -- TCP status is what determines if a link will succeed.

@bfirsh
Copy link
Contributor Author

bfirsh commented May 21, 2015

@tiborvass I think it would enormously beneficial for the container to have a standardised way of determining its health, even if its an orchestration system which actually runs the check. This would allow monitoring systems, etc, to determine whether the container was healthy, without having to know what was inside the container.

Perhaps this could be more generic than the proposal suggested here:

FROM python:2.7
ADD . /code
WORKDIR /code
RUN pip install -r requirements.txt
CMD python app.py
EXPOSE 80
HEALTHCHECK nc -z localhost 80

I'll open a new issue about this.

@diogomonica
Copy link
Contributor

@bfirsh I think that having a container determine it's own health status, and having a standard way to report it out might make sense.

In particular, if the only thing that dockerd does is periodically run HEALTHCHECK and expose the result in the container INFO API call, that might work.

We should absolutely make it clear that the responsibility of doing actions based on the output of that check is of an external application, not docker.

@sanderboom
Copy link

Great suggestion

@hoIIer
Copy link

hoIIer commented Oct 20, 2015

@artem-sidorenko I am trying to do something similar but I have 3 separate services that each have a linked postgres db container...

should I do something like this?

contentservicedb:
  image: postgres:latest
  command: ./postgres.sh <------------- run health checks here? or in contentservice?
contentservice:
  extends:
    file: build/service-content/compose.yml
    service: web
  ports:
    - "8080"
  links:
    - contentservicedb

postgres.sh

#!/bin/sh
i=0
while ! nc "postgres" "5432" >/dev/null 2>&1 < /dev/null; do
  i=`expr $i + 1`
  if [ $i -ge 50 ]; then
    echo "$(date) - postgres:5432 still not reachable, giving up"
    exit 1
  fi
  echo "$(date) - waiting for postgres:5432..."
  sleep 1
done
echo "postgres connection established"

ps that doesnt work I get Cannot start container a0e3979200e9a9615999c9cc90d0f65f14da658bfcb46f00bfe277dc23617189: Cannot link to a non running container: /overlord_contentservicedb_1 AS /overlord_contentservice_1/contentservicedb

@artem-sidorenko
Copy link

@erichonkanen you should run the checks in contentservice, as contentservice should wait for DB to be ready. Besides that you should have a command to start the service itself after DB is ready

@hoIIer
Copy link

hoIIer commented Oct 20, 2015

@artem-sidorenko when you say I should have a command that starts the service after DB is ready, do you mean just exit the loop and let compose spin it up? (thats all ive done so far w/docker-compose), or are you referring to any app-level stuff I need to run e.g. "python manage.my runserver" (this is indeed what runs inside my contentservice bootstrap.sh

@artem-sidorenko
Copy link

@erichonkanen I mean something what gets called after db is ready, something like

#!/bin/sh
i=0
while ! nc "postgres" "5432" >/dev/null 2>&1 < /dev/null; do
  i=`expr $i + 1`
  if [ $i -ge 50 ]; then
    echo "$(date) - postgres:5432 still not reachable, giving up"
    exit 1
  fi
  echo "$(date) - waiting for postgres:5432..."
  sleep 1
done
echo "postgres connection established"
python manage.my runserver # <-- START APPLICATION

@hoIIer
Copy link

hoIIer commented Oct 20, 2015

@artem-sidorenko ok great thanks a bunch!

@hoIIer
Copy link

hoIIer commented Oct 22, 2015

in case anyone else arrives here looking to find a way to ensure postgres connection is open before proceeding with a webapp.. here is what I landed on and works nicely. (Add postgresql-client to the webapp Dockerfile)

#!/bin/sh
su -c "
    while ! psql --host=webappservicedb --username=postgres > /dev/null 2>&1; do
        echo 'Waiting for webappservicedb connection with postgres...'
        sleep 1;
    done;
    echo 'Connected to postgres...';"
su -c "python manage.py migrate"
su -c "python manage.py runserver 0.0.0.0:8080"

@aluzzardi
Copy link
Member

@tiborvass I think the Engine should provide a high level way of doing health checks.

This should include something like, how to check (tcp connect, http, custom command), how many retries and delay between retries.

I don't think Swarm should be doing that - every machine should be checking its own containers and Swarm could actually leverage such checks.

@vikas027
Copy link

decking has an option ready for doing this.

@carloscarcamo
Copy link

+1

@aramalipoor
Copy link

👍 for HEALTHCHECK instruction.

@wernight
Copy link

I don't feel this should be part of Docker itself especially as ports may or not be open depending on whatever conditions. This should be part of whatever is handling the startup of containers if they are meant to be run as a web service (e.g. Kubernetes).

@crosbymichael
Copy link
Contributor

@aluzzardi Having the runtime health check the things its running sounds like a recipe for disaster if you really think about it ;)

It would be like giving a kid a credit card and letting them reconcile the bill, ya mom, i didn't spend all the money.

Having a runtime do healthchecks creates a SPOF. Healthchecking should be a 'third party' app making sure the runtime is held accountable for what it is supposed to run.

@Tails
Copy link

Tails commented Dec 25, 2015

+1

@darkn3rd
Copy link

👍 Sort of defeats purpose of orchestration, if containers are started before containers it depends on are ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny
Projects
None yet
Development

No branches or pull requests