Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploy without service breakdown #654

Open
honibis opened this issue Feb 9, 2023 · 1 comment
Open

Deploy without service breakdown #654

honibis opened this issue Feb 9, 2023 · 1 comment

Comments

@honibis
Copy link

honibis commented Feb 9, 2023

Description

When deploying/redeploying service instances are killed causing current operations and any long running tasks to end. Also causing connection interruption from the client side. Is it possible to use draining in coordination with load balancer(like traefik)?

Steps to reproduce the issue:

  1. Redeploy a service

What happens:
Error response to client while restarting tasks and killing long running tasks.

What should happen:
1- Create new tasks and redirect new traffic only to new tasks
2- Mark the old tasks as draining and prevent any new connections
3- Wait a certain time(couple minutes, hopefully configurable)
4- End old tasks

Additional information (e.g. docker version, cluster setup,...):

@honibis
Copy link
Author

honibis commented Feb 12, 2023

I set update config to start first and gave a bit more time which helped a bit:

      update_config:
        parallelism: 1
        order: start-first
        failure_action: rollback
        delay: 30s

Also tried to change load balancing to swarm using
traefik.docker.lbswarm=true
But this was not helpful because i could not make lbswarm work with traefik, all the requests ended up at the same task.

Then to make things a bit more robust i added health-check to traefik

        - traefik.http.services.gsb.loadbalancer.healthcheck.path=/api/service/health
        - traefik.http.services.gsb.loadbalancer.healthcheck.interval=200ms
        - traefik.http.services.gsb.loadbalancer.healthcheck.timeout=75ms
        - traefik.http.services.gsb.loadbalancer.healthcheck.scheme=http

Currently it is ok, will work on zero downtime later.
And before forgetting swarmpit is helping me a lot, thanks to all who participated in this wonderful application :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant