Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dependency failed to start: container for service "web" is unhealthy #67

Open
clone3448 opened this issue Nov 7, 2023 · 10 comments
Open

Comments

@clone3448
Copy link

clone3448 commented Nov 7, 2023

Good day, I tried to deploy the production docker compose image on the container manager on my synology ds923+ but got the error: dependency failed to start: container for service "web" is unhealthy.
I have altered the compose provided on github a bit to this (mainly the volumes):
`services:
web:
image: wger/server:latest
container_name: wger_server
depends_on:
db:
condition: service_healthy
cache:
condition: service_healthy
env_file:
- /volume1/docker/wger/config/prod.env
volumes:
- static:/home/wger/static
- media:/home/wger/media
expose:
- 8000
healthcheck:
test: wget --no-verbose --tries=1 --spider http://localhost:8000
interval: 10s
timeout: 5s
retries: 5
restart: unless-stopped

nginx:
image: nginx:stable
container_name: wger_nginx
depends_on:
- web
volumes:
- /volume1/docker/wger/config/nginx.conf:/etc/nginx/conf.d/default.conf
- static:/wger/static:ro
- media:/wger/media:ro
ports:
- "8001:80"
healthcheck:
test: service nginx status
interval: 10s
timeout: 5s
retries: 5
restart: unless-stopped

db:
image: postgres:15-alpine
container_name: wger_db
environment:
- POSTGRES_USER=wger
- POSTGRES_PASSWORD=wger
- POSTGRES_DB=wger
volumes:
- postgres-data:/var/lib/postgresql/data/
expose:
- 5432
healthcheck:
test: pg_isready -U wger
interval: 10s
timeout: 5s
retries: 5
restart: unless-stopped

cache:
image: redis
container_name: wger_cache
expose:
- 6379
volumes:
- redis-data:/data
healthcheck:
test: redis-cli ping
interval: 10s
timeout: 5s
retries: 5
restart: unless-stopped

celery_worker:
image: wger/server:latest
container_name: wger_celery_worker
command: /start-worker
env_file:
- /volume1/docker/wger/config/prod.env
volumes:
- media:/home/wger/media
depends_on:
web:
condition: service_healthy
healthcheck:
test: celery -A wger inspect ping
interval: 10s
timeout: 5s
retries: 5

celery_beat:
image: wger/server:latest
container_name: wger_celery_beat
command: /start-beat
volumes:
- celery-beat:/home/wger/beat/
env_file:
- /volume1/docker/wger/config/prod.env
depends_on:
celery_worker:
condition: service_healthy

volumes:
postgres-data:
celery-beat:
static:
media:
redis-data:

networks:
default:
name: wger_network`
Furthermore the nginx.conf is not altered, and the prod.env is only altered with the SECRET_KEY and SIGNING_KEY.

I can access the website, which looks like this:
image

The wger_server docker container seems to be not working correctly, looking in the log I see the following:
image
After thousands of items being deleted, I get this:
image

The other docker containers seem to not have a lot of issues in the log, except wger_celery_worker
image

The console terminal of the entire stack looks like this:
image
image

What is going wrong in my configurations and how can I deal with it? First time I am using databases in a docker compose file.

@clone3448 clone3448 changed the title dependency failed to start: container for service "web" in unhealthy dependency failed to start: container for service "web" is unhealthy Nov 7, 2023
@bbkz
Copy link

bbkz commented Nov 7, 2023

I don't know the docker-compose setup. But starting up the wger container takes a long time especialy on lower end hardware. As i'm running it on raspberry pi's and similar i had to do some tweaks.

For gunicorn not to run into a timeout, you may need to add the following environment variable:

GUNICORN_CMD_ARGS="--timeout 240 --workers=2"

A other idea would be to also disable the healthchecks , i don't know on docker compose but kubernetes will otherwise kill the container and start it again (loop).

@rolandgeider
Copy link
Member

Hi! Do you get some error in the logs when opening the application? (in the web service) I just started a new instance with the default compose and conf file and everything booted up nicely:

NAME                 IMAGE                COMMAND                  SERVICE         CREATED              STATUS                        PORTS
wger_cache           redis                "docker-entrypoint.s…"   cache           About a minute ago   Up About a minute (healthy)   0.0.0.0:6379->6379/tcp
wger_celery_beat     wger/server:latest   "/start-beat"            celery_beat     About a minute ago   Up About a minute             8000/tcp
wger_celery_flower   wger/server:latest   "/start-flower"          celery_flower   About a minute ago   Up About a minute (healthy)   0.0.0.0:5555->5555/tcp, 8000/tcp
wger_celery_worker   wger/server:latest   "/start-worker"          celery_worker   About a minute ago   Up About a minute (healthy)   8000/tcp
wger_db              postgres:15-alpine   "docker-entrypoint.s…"   db              About a minute ago   Up About a minute (healthy)   0.0.0.0:5432->5432/tcp
wger_nginx           nginx:stable         "/docker-entrypoint.…"   nginx           About a minute ago   Up About a minute (healthy)   0.0.0.0:80->80/tcp, 0.0.0.0:8080->80/tcp
wger_server          wger/server:latest   "/home/wger/entrypoi…"   web             About a minute ago   Up About a minute (healthy)   8000/tcp

Somebody else had the problem that the application tried to setup the database before it was ready so some things were missing. What helped them was to drop the db volume, start the db service manually first and then all the rest (this only this first initial run, later it's not important)

@clone3448
Copy link
Author

clone3448 commented Nov 8, 2023

First of all, thank you for responding.
@rolandgeider When I open the application I do not see new logs after the following logs when I rebuilded the stack (no change):
image

When I deleted the volume entry at the db service in the compose, and start the db service manually I have the same issue.
Do you think I should disable the healthchecks under wger_service as proposed by bbkz? Because when I did, still have the same issue. However, then I was thinking about celery_worker and celery_beat, they do not activate due to this healthcheck dependency.

@bbkz I don't think it is a problem based on lower end hardware. However I tried to add that env entry GUNICORN_CMD_ARGS="--timeout 240 --workers=2" in the prod.env file. But no difference in the result.
image

@clone3448
Copy link
Author

clone3448 commented Nov 8, 2023

When I removed the healthcheck dependency for the celery_worker, I produced a log for that container, maybe this might help troubleshooting:
image

But okay, when I restored back to the first compose file. I altered the prod.env for the debugging mode DJANGO_DEBUG=True
The webpage now shows the following:
image
`Environment:

Request Method: GET
Request URL: http://workout.XXXXXXX.com/en/software/terms-of-service

Django Version: 4.1.9
Python Version: 3.10.6
Installed Applications:
('django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.messages',
'django.contrib.sessions',
'django.contrib.sites',
'django.contrib.staticfiles',
'django_extensions',
'storages',
'wger.config',
'wger.core',
'wger.mailer',
'wger.exercises',
'wger.gym',
'wger.manager',
'wger.nutrition',
'wger.software',
'wger.utils',
'wger.weight',
'wger.gallery',
'wger.measurements',
'captcha',
'django.contrib.sitemaps',
'easy_thumbnails',
'compressor',
'crispy_forms',
'crispy_bootstrap5',
'rest_framework',
'rest_framework.authtoken',
'django_filters',
'rest_framework_simplejwt',
'drf_spectacular',
'drf_spectacular_sidecar',
'django_bootstrap_breadcrumbs',
'corsheaders',
'axes',
'simple_history',
'django_email_verification',
'actstream',
'fontawesomefree')
Installed Middleware:
('corsheaders.middleware.CorsMiddleware',
'django.middleware.common.CommonMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'wger.utils.middleware.JavascriptAJAXRedirectionMiddleware',
'wger.utils.middleware.WgerAuthenticationMiddleware',
'wger.utils.middleware.RobotsExclusionMiddleware',
'django.contrib.messages.middleware.MessageMiddleware',
'django.middleware.clickjacking.XFrameOptionsMiddleware',
'django.middleware.locale.LocaleMiddleware',
'simple_history.middleware.HistoryRequestMiddleware',
'axes.middleware.AxesMiddleware')

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/django/core/handlers/exception.py", line 56, in inner
response = get_response(request)
File "/usr/local/lib/python3.10/dist-packages/django/core/handlers/base.py", line 220, in _get_response
response = response.render()
File "/usr/local/lib/python3.10/dist-packages/django/template/response.py", line 114, in render
self.content = self.rendered_content
File "/usr/local/lib/python3.10/dist-packages/django/template/response.py", line 92, in rendered_content
return template.render(context, self._request)
File "/usr/local/lib/python3.10/dist-packages/django/template/backends/django.py", line 61, in render
return self.template.render(context)
File "/usr/local/lib/python3.10/dist-packages/django/template/base.py", line 173, in render
with context.bind_template(self):
File "/usr/lib/python3.10/contextlib.py", line 135, in enter
return next(self.gen)
File "/usr/local/lib/python3.10/dist-packages/django/template/context.py", line 254, in bind_template
updates.update(processor(self.request))
File "/home/wger/src/wger/utils/context_processor.py", line 85, in processor
get_custom_header(request),
File "/home/wger/src/wger/utils/context_processor.py", line 126, in get_custom_header
global_gymconfig = GymConfig.objects.get(pk=1)
File "/usr/local/lib/python3.10/dist-packages/django/db/models/manager.py", line 85, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/django/db/models/query.py", line 650, in get
raise self.model.DoesNotExist(

Exception Type: DoesNotExist at /en/software/terms-of-service
Exception Value: GymConfig matching query does not exist.`
The logs of wger_server container talks about an internal server error:
image

@rolandgeider
Copy link
Member

yes the gymconfig stuff, that definitely means that the database wasn't initialised properly.

sorry, I didn't mean that you remove the volume from the compose file, just to delete the volume itself and start the service manually, so like this

docker compose down
docker volume rm docker_postgres-data
docker compose up db -d # wait some seconds
docker compose up

(also you should get a medal for all the logs you provide!)

@clone3448
Copy link
Author

thank you! I expected providing as much as possible might be the best to troubleshoot :)
When you talked about not deleting the volume from the compose and just deleing the volume itself, I was then looking where the files were actually stored; they were not stored anywhere. So I changed the volume paths again to correct folders that I created now, because the folders did not exist at first. I changed the compose file to the following:
`services:
web:
image: wger/server:latest
container_name: wger_server
depends_on:
db:
condition: service_healthy
cache:
condition: service_healthy
env_file:
- /volume1/docker/wger/config/prod.env
volumes:
- static:/home/wger/static
- media:/home/wger/media
expose:
- 8000
healthcheck:
test: wget --no-verbose --tries=1 --spider http://localhost:8000
interval: 10s
timeout: 5s
retries: 5
restart: unless-stopped

nginx:
image: nginx:stable
container_name: wger_nginx
depends_on:
- web
volumes:
- /volume1/docker/wger/config/nginx.conf:/etc/nginx/conf.d/default.conf
- static:/wger/static:ro
- media:/wger/media:ro
ports:
- "8001:80"
healthcheck:
test: service nginx status
interval: 10s
timeout: 5s
retries: 5
restart: unless-stopped

db:
image: postgres:15-alpine
container_name: wger_db
environment:
- POSTGRES_USER=wger
- POSTGRES_PASSWORD=wger
- POSTGRES_DB=wger
volumes:
- /volume1/docker/wger/postgres-data:/var/lib/postgresql/data/
expose:
- 5432
healthcheck:
test: pg_isready -U wger
interval: 10s
timeout: 5s
retries: 5
restart: unless-stopped

cache:
image: redis
container_name: wger_cache
expose:
- 6379
volumes:
- redis-data:/data
healthcheck:
test: redis-cli ping
interval: 10s
timeout: 5s
retries: 5
restart: unless-stopped

celery_worker:
image: wger/server:latest
container_name: wger_celery_worker
command: /start-worker
env_file:
- /volume1/docker/wger/config/prod.env
volumes:
- media:/home/wger/media
depends_on:
web:
condition: service_healthy
healthcheck:
test: celery -A wger inspect ping
interval: 10s
timeout: 5s
retries: 5

celery_beat:
image: wger/server:latest
container_name: wger_celery_beat
command: /start-beat
volumes:
- celery-beat:/home/wger/beat/
env_file:
- /volume1/docker/wger/config/prod.env
depends_on:
celery_worker:
condition: service_healthy

volumes:
postgres-data:
celery-beat:
static:
media:
redis-data:

networks:
default:
name: wger_network`

Now I can find the db volume, and it holds files and folders! So that is some progress. Now the website looks like this, and I think this is more familiar to you:
image

I will check whether all features work another time, maybe tonight and update you. However, where should I look to really know it all works according to plan?

@rolandgeider
Copy link
Member

the volumes are handled by docker and are stored... somewhere, but solve a lot problems with things like permissions etc. You can inspect a volume with docker volume inspect <name> if you want to know where the actual files are stored. But mapping folders manually should work as well.

You can download the exercise images with docker compose exec web python3 manage.py download-exercise-images and see if they appear (I'm not sure if we did fix the issue with the cache, they might need some time to show up), but if you can see those and the rest seems to work, you should be good to go

@goodnewz
Copy link
Contributor

@clone3448 I had a similar problem. It turns out the first time wger starts, it does some extra setup things that require a bit more time. If it does not finish within the healthcheck interval of 5*10s, it fails the healthcheck with state unhealty. Docker provides an option for such a situation called start_period. All you do is add start_period: 300s to the healthcheck: section of the web container, and Bob is your uncle.

goodnewz added a commit to goodnewz/docker that referenced this issue Mar 31, 2024
The first time Wger starts, it does some extra setup things that require a bit more time to finish before the health check calls it quits. This commit adds a reasonable warmup period before it starts to enforce the health checks. 

It addresses wger-project#67.
@rolandgeider
Copy link
Member

FYI the PR with the start period is merged, hopefully this fixes it

@greenbagels
Copy link

@clone3448 I had a similar problem. It turns out the first time wger starts, it does some extra setup things that require a bit more time. If it does not finish within the healthcheck interval of 5*10s, it fails the healthcheck with state unhealty. Docker provides an option for such a situation called start_period. All you do is add start_period: 300s to the healthcheck: section of the web container, and Bob is your uncle.

Hi, just curious: you mentioned you need to add that start_period option to the web container, but your PR doesn't (it adds it to the nginx container). Is this intentional?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants