Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's going on when the restart traefik instance on dcos? #363

Open
hbceylan opened this issue Jun 6, 2018 · 3 comments
Open

What's going on when the restart traefik instance on dcos? #363

hbceylan opened this issue Jun 6, 2018 · 3 comments

Comments

@hbceylan
Copy link

hbceylan commented Jun 6, 2018

What's going on when the restart traefik instances on dcos? Our microservices are unreachable? Yes! How can I handle this?

screen shot 2018-06-06 at 21 52 09

screen shot 2018-06-06 at 21 52 50

@judithpatudith
Copy link
Contributor

Hi! In order to get community help with this would you mind posting on either the users mailing list users@dcos.io or Slack at chat.dcos.io? I don't know too much about Traefik but you might find someone there who does 🙂

@ryadav88
Copy link
Contributor

ryadav88 commented Jun 6, 2018

@deric ^

@deric
Copy link
Contributor

deric commented Jun 7, 2018

@hbceylan Which Traefik package version do you use?

In the latest version there's a healthcheck configured on $PORT0:

  "healthChecks": [
    {
      "gracePeriodSeconds": 20,
      "intervalSeconds": 5,
      "maxConsecutiveFailures": 2,
      "portIndex": 0,
      "timeoutSeconds": 2,
      "delaySeconds": 15,
      "protocol": "MESOS_HTTP",
      "path": "/ping"
    }
  ],

in your case it appears that port 80 ("portIndex": 0) is used for public connections and does not respond to /ping (healthcheck request). Port 8080 is probably the "admin" interface entrypoint, that is configured to respond to healthchecks. Judging from the screenshot you should probably use:

      "portIndex": 1,

or reorder ports, so that healthchecks will pass (check error log). Also when you use:

  "upgradeStrategy": {
    "minimumHealthCapacity": 0.5
  },

it means that you'll need at least 2 public nodes, because you're allocating fixed ports 80,443,8080 which can't be allocated to multiple instances at the same time. When restarting task Marathon will kill one instance, stage the job and wait until healthcheck passes, then restart the remaining instance(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants