Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permission issues when using SaltStack lead to workers failing to restart #25

Open
r3boot opened this issue Mar 11, 2024 · 18 comments · May be fixed by #28
Open

Permission issues when using SaltStack lead to workers failing to restart #25

r3boot opened this issue Mar 11, 2024 · 18 comments · May be fixed by #28
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@r3boot
Copy link

r3boot commented Mar 11, 2024

We currently are evaluating the misp-{docker,modules} v2.4.186 containers to replace our existing misp installation (based on the coolacid containers), and we are running into an issue with the workers.

Observed behaviour

The container boots, and fires up the workers, all is good:

2024-03-06 14:03:26 Info: [WORKER PID: 1036][default] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1043][email] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1049][email] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1038][default] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1039][default] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1045][email] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1041][default] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1037][default] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1069][cache] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1055][email] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1098][prio] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1066][cache] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1059][email] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1063][cache] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1075][cache] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1091][prio] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1100][prio] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1095][prio] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1111][prio] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1082][cache] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1134][update] - starting to process background jobs...

24 hours later, the workers are gracefully killed:

2024-03-07 14:03:50 Info: [WORKER PID: 1036][default] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1043][email] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1049][email] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1038][default] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1039][default] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1037][default] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1041][default] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1045][email] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1069][cache] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1063][cache] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1098][prio] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1091][prio] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1055][email] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1059][email] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1066][cache] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1100][prio] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1111][prio] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1134][update] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1075][cache] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1082][cache] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1095][prio] - worker max execution time reached, exiting gracefully worker...

However, the workers are not started again, and we need to restart the container to get back the workers.

Expected behaviour:

The misp-core container manages its workers without any intervention OR a procedure is provided on how to restart the workers.

Configuration

We use a modified setup based on the docker-compose setup, but by using systemd unit files. The containers are rebuilt using the following Dockerfile:

FROM ghcr.io/misp/misp-docker/misp-core:v2.4.186

RUN \
    mkdir -p /var/www/MISP/app/files.dist/misp-objects/objects/phish-healthcare && \
    apt update && \
    apt -qqy install less && \
    apt clean

COPY files/phish-healthcare.json /var/www/MISP/app/files.dist/misp-objects/objects/phish-healthcare/definition.json

The unit file we use to start this container is as follows:

[Unit]
Description=MISP core running inside of docker
After=network.target
[Service]
Type=simple
ExecStartPre=/bin/bash -c "cat /etc/image-pull-secret.txt | docker login -u xxyyzz --password-stdin aa.bb.cc:5050"
ExecStart=/usr/bin/docker run \
  --rm \
  --name misp-core \
  --network host \
  --env-file /etc/misp/core.env \
  -v /etc/misp/config:/var/www/MISP/app/Config \
  -v /etc/misp/customization:/var/www/MISP/app/webroot/img/custom \
  -v /var/log/misp:/var/www/MISP/app/tmp/logs \
  -v /var/lib/misp:/var/www/MISP/app/files \
  -v /etc/misp/gnupg:/var/www/MISP/.gnupg \
  -v /etc/easytls/certs/misp.cert:/etc/nginx/certs/cert.pem \
  -v /etc/easytls/certs/misp.key:/etc/nginx/certs/key.pem \
  aa.bb.cc:5050/docker/misp-core:10
ExecStop=/usr/bin/docker kill misp-core
TimeoutStartSec=0
Restart=always
RestartSec=3s
[Install]
WantedBy=default.target

The corresponding env file contains the following:

ADMIN_EMAIL=admin@admin.test
ADMIN_ORG=AAABBBCCC
ADMIN_KEY=
ADMIN_PASSWORD=secret
GPG_PASSPHRASE=passphrase
CRON_USER_ID=1
BASE_URL=https://misp01.aa.bb.cc
SMARTHOST_ADDRESS=mail01.aa.bb.cc
MISP_EMAIL=admin@admin.test
MISP_CONTACT=admin@admin.test
AUTOCONF_GPG=False
AUTOGEN_ADMIN_KEY=True
DISABLE_IPV6=True
MYSQL_HOST=mysql.aa.bb.cc
MYSQL_PORT=3306
MYSQL_USER=misp
MYSQL_PASSWORD=secret
MYSQL_DATABASE=misp
REDIS_FQDN=localhost
@ostefano
Copy link
Collaborator

Interesting issue, and I wonder whether the issue is related to MISP itself and its new way of managing background workers. Next time they get stuck can you open a shell inside the container and see if supervisorctl restart misp-workers:* restarts the workers? Note: workers should restart automatically because there is 'autorestart=true' inside 50-workers.cong.

@iglocska is it expected that workers are killed after a while? See 'worker max execution time reached' in the logs. If that is expected, is it possible to reduce the maximum execution time to reproduce the issue more easily?

@r3boot
Copy link
Author

r3boot commented Mar 11, 2024

Sure thing! I have a dev instance running for exactly this purpose:

[dev][root@misp01:/root]↥ # docker ps -a
CONTAINER ID   IMAGE                                            COMMAND                  CREATED      STATUS      PORTS     NAMES
7dd975a6f8d7   git.aa.bb.cc:5050/docker/misp-core:10     "/entrypoint.sh"         4 days ago   Up 4 days             misp-core
1089178ba607   git.aa.bb.cc:5050/docker/misp-modules:6   "/usr/local/bin/misp…"   5 days ago   Up 5 days             gallant_dubinsky
866b066cdf4c   git.aa.bb.cc:5050/docker/redis:1          "docker-entrypoint.s…"   9 days ago   Up 9 days             amazing_bartik
[dev][root@misp01:/root]↥ # docker exec -it 7dd975a6f8d7 bash
root@misp01:/var/www/MISP# supervisorctl restart misp-workers:*
misp-workers:default_00: ERROR (spawn error)
misp-workers:default_01: ERROR (spawn error)
misp-workers:default_02: ERROR (spawn error)
misp-workers:default_03: ERROR (spawn error)
misp-workers:default_04: ERROR (spawn error)
misp-workers:email_00: ERROR (spawn error)
misp-workers:email_01: ERROR (spawn error)
misp-workers:email_02: ERROR (spawn error)
misp-workers:email_03: ERROR (spawn error)
misp-workers:email_04: ERROR (spawn error)
misp-workers:cache_00: ERROR (spawn error)
misp-workers:cache_01: ERROR (spawn error)
misp-workers:cache_02: ERROR (spawn error)
misp-workers:cache_03: ERROR (spawn error)
misp-workers:cache_04: ERROR (spawn error)
misp-workers:prio_00: ERROR (spawn error)
misp-workers:prio_01: ERROR (spawn error)
misp-workers:prio_02: ERROR (spawn error)
misp-workers:prio_03: ERROR (spawn error)
misp-workers:prio_04: ERROR (spawn error)
misp-workers:update_00: ERROR (spawn error)

No logging has been generated for this event, so I cannot see why this spawn error is happening.

I also doublechecked the configuration inside of the container, and we have autorestart=true for all workers:

root@misp01:/etc/supervisor/conf.d# grep restart 50-workers.conf
autorestart=true
autorestart=true
autorestart=true
autorestart=true
autorestart=true

EDIT: Note, if I start one of the workers by hand, it does work:

root@misp01:/etc/supervisor/conf.d# sudo -u www-data /var/www/MISP/app/Console/cake start_worker cache
2024-03-11 13:22:55 Info: [WORKER PID: 12094][cache] - starting to process background jobs...

@ostefano
Copy link
Collaborator

ostefano commented Mar 11, 2024

@r3boot did some tests by forcing the workers to auto-kill themselves after 10 seconds.

Everything looks good here:

misp-core-1     | 2024-03-11 13:37:12,078 INFO exited: cache_03 (exit status 0; expected)
misp-core-1     | 2024-03-11 13:37:12,088 INFO spawned: 'cache_03' with pid 1908
misp-core-1     | 2024-03-11 13:37:12,089 INFO exited: cache_00 (exit status 0; expected)
misp-core-1     | 2024-03-11 13:37:12,089 INFO exited: cache_01 (exit status 0; expected)
misp-core-1     | 2024-03-11 13:37:12,089 INFO exited: cache_02 (exit status 0; expected)
misp-core-1     | 2024-03-11 13:37:12,090 INFO exited: cache_04 (exit status 0; expected)
misp-core-1     | 2024-03-11 13:37:12,094 INFO spawned: 'cache_00' with pid 1909
misp-core-1     | 2024-03-11 13:37:12,097 INFO spawned: 'cache_01' with pid 1910
misp-core-1     | 2024-03-11 13:37:12,099 INFO spawned: 'cache_02' with pid 1911
misp-core-1     | 2024-03-11 13:37:12,102 INFO spawned: 'cache_04' with pid 1916
misp-core-1     | 2024-03-11 13:37:13,165 INFO success: cache_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
misp-core-1     | 2024-03-11 13:37:13,166 INFO success: cache_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
misp-core-1     | 2024-03-11 13:37:13,166 INFO success: cache_02 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
misp-core-1     | 2024-03-11 13:37:13,166 INFO success: cache_03 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
misp-core-1     | 2024-03-11 13:37:13,166 INFO success: cache_04 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
misp-core-1     | 2024-03-11 13:37:42,203 INFO exited: cache_00 (exit status 0; expected)
misp-core-1     | 2024-03-11 13:37:42,215 INFO spawned: 'cache_00' with pid 1948
misp-core-1     | 2024-03-11 13:37:42,216 INFO exited: cache_03 (exit status 0; expected)
misp-core-1     | 2024-03-11 13:37:42,217 INFO exited: cache_01 (exit status 0; expected)
misp-core-1     | 2024-03-11 13:37:42,217 INFO exited: cache_02 (exit status 0; expected)
misp-core-1     | 2024-03-11 13:37:42,217 INFO exited: cache_04 (exit status 0; expected)
misp-core-1     | 2024-03-11 13:37:42,222 INFO spawned: 'cache_01' with pid 1950
misp-core-1     | 2024-03-11 13:37:42,224 INFO spawned: 'cache_02' with pid 1953
misp-core-1     | 2024-03-11 13:37:42,226 INFO spawned: 'cache_03' with pid 1958
misp-core-1     | 2024-03-11 13:37:42,230 INFO spawned: 'cache_04' with pid 1965
misp-core-1     | 2024-03-11 13:37:43,283 INFO success: cache_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
misp-core-1     | 2024-03-11 13:37:43,283 INFO success: cache_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
misp-core-1     | 2024-03-11 13:37:43,283 INFO success: cache_02 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
misp-core-1     | 2024-03-11 13:37:43,283 INFO success: cache_03 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
misp-core-1     | 2024-03-11 13:37:43,283 INFO success: cache_04 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

I have also tried supervisorctl restart misp-workers:* and the commands succeeds here with no errors.
I wonder whether we might be hitting some system limits.

In conclusion, I have no further suggestion but to investigate further why supervisorctl restart misp-workers:* does not work. That might give you some hints re the underlying cause.

@r3boot
Copy link
Author

r3boot commented Mar 11, 2024

Diving into this deeper, I found the following error in /var/log/supervisor/supervisord.log:

2024-03-11 13:45:58,097 INFO spawnerr: unknown error making dispatchers for 'default_00': EACCES
2024-03-11 13:45:58,098 INFO spawnerr: unknown error making dispatchers for 'default_01': EACCES
2024-03-11 13:45:58,098 INFO spawnerr: unknown error making dispatchers for 'default_02': EACCES
2024-03-11 13:45:58,099 INFO spawnerr: unknown error making dispatchers for 'default_03': EACCES
2024-03-11 13:45:58,099 INFO spawnerr: unknown error making dispatchers for 'default_04': EACCES
2024-03-11 13:45:58,100 INFO spawnerr: unknown error making dispatchers for 'email_00': EACCES
2024-03-11 13:45:58,100 INFO spawnerr: unknown error making dispatchers for 'email_01': EACCES
2024-03-11 13:45:58,102 INFO spawnerr: unknown error making dispatchers for 'email_02': EACCES
2024-03-11 13:45:58,103 INFO spawnerr: unknown error making dispatchers for 'email_03': EACCES
2024-03-11 13:45:58,103 INFO spawnerr: unknown error making dispatchers for 'email_04': EACCES
2024-03-11 13:45:58,104 INFO spawnerr: unknown error making dispatchers for 'cache_00': EACCES
2024-03-11 13:45:58,104 INFO spawnerr: unknown error making dispatchers for 'cache_01': EACCES
2024-03-11 13:45:58,105 INFO spawnerr: unknown error making dispatchers for 'cache_02': EACCES
2024-03-11 13:45:58,107 INFO spawnerr: unknown error making dispatchers for 'cache_03': EACCES
2024-03-11 13:45:58,108 INFO spawnerr: unknown error making dispatchers for 'cache_04': EACCES
2024-03-11 13:45:58,108 INFO spawnerr: unknown error making dispatchers for 'prio_00': EACCES
2024-03-11 13:45:58,109 INFO spawnerr: unknown error making dispatchers for 'prio_01': EACCES
2024-03-11 13:45:58,109 INFO spawnerr: unknown error making dispatchers for 'prio_02': EACCES
2024-03-11 13:45:58,110 INFO spawnerr: unknown error making dispatchers for 'prio_03': EACCES
2024-03-11 13:45:58,110 INFO spawnerr: unknown error making dispatchers for 'prio_04': EACCES
2024-03-11 13:45:58,111 INFO spawnerr: unknown error making dispatchers for 'update_00': EACCES
2024-03-11 13:45:59,114 INFO spawnerr: unknown error making dispatchers for 'default_00': EACCES
2024-03-11 13:45:59,115 INFO spawnerr: unknown error making dispatchers for 'default_01': EACCES
2024-03-11 13:45:59,115 INFO spawnerr: unknown error making dispatchers for 'default_02': EACCES
2024-03-11 13:45:59,115 INFO spawnerr: unknown error making dispatchers for 'default_03': EACCES
2024-03-11 13:45:59,116 INFO spawnerr: unknown error making dispatchers for 'default_04': EACCES
2024-03-11 13:45:59,116 INFO spawnerr: unknown error making dispatchers for 'email_00': EACCES
2024-03-11 13:45:59,116 INFO spawnerr: unknown error making dispatchers for 'email_01': EACCES
2024-03-11 13:45:59,117 INFO spawnerr: unknown error making dispatchers for 'email_02': EACCES
2024-03-11 13:45:59,117 INFO spawnerr: unknown error making dispatchers for 'email_03': EACCES
2024-03-11 13:45:59,117 INFO spawnerr: unknown error making dispatchers for 'email_04': EACCES
2024-03-11 13:45:59,117 INFO spawnerr: unknown error making dispatchers for 'cache_00': EACCES
2024-03-11 13:45:59,117 INFO spawnerr: unknown error making dispatchers for 'cache_01': EACCES
2024-03-11 13:45:59,118 INFO spawnerr: unknown error making dispatchers for 'cache_02': EACCES
2024-03-11 13:45:59,118 INFO spawnerr: unknown error making dispatchers for 'cache_03': EACCES
2024-03-11 13:45:59,118 INFO spawnerr: unknown error making dispatchers for 'cache_04': EACCES
2024-03-11 13:45:59,118 INFO spawnerr: unknown error making dispatchers for 'prio_00': EACCES
2024-03-11 13:45:59,119 INFO spawnerr: unknown error making dispatchers for 'prio_01': EACCES
2024-03-11 13:45:59,119 INFO spawnerr: unknown error making dispatchers for 'prio_02': EACCES
2024-03-11 13:45:59,119 INFO spawnerr: unknown error making dispatchers for 'prio_03': EACCES
2024-03-11 13:45:59,119 INFO spawnerr: unknown error making dispatchers for 'prio_04': EACCES
2024-03-11 13:45:59,120 INFO spawnerr: unknown error making dispatchers for 'update_00': EACCES
2024-03-11 13:46:01,123 INFO spawnerr: unknown error making dispatchers for 'default_00': EACCES
2024-03-11 13:46:01,123 INFO spawnerr: unknown error making dispatchers for 'default_01': EACCES
2024-03-11 13:46:01,123 INFO spawnerr: unknown error making dispatchers for 'default_02': EACCES
2024-03-11 13:46:01,124 INFO spawnerr: unknown error making dispatchers for 'default_03': EACCES
2024-03-11 13:46:01,124 INFO spawnerr: unknown error making dispatchers for 'default_04': EACCES
2024-03-11 13:46:01,124 INFO spawnerr: unknown error making dispatchers for 'email_00': EACCES
2024-03-11 13:46:01,124 INFO spawnerr: unknown error making dispatchers for 'email_01': EACCES
2024-03-11 13:46:01,124 INFO spawnerr: unknown error making dispatchers for 'email_02': EACCES
2024-03-11 13:46:01,124 INFO spawnerr: unknown error making dispatchers for 'email_03': EACCES
2024-03-11 13:46:01,125 INFO spawnerr: unknown error making dispatchers for 'email_04': EACCES
2024-03-11 13:46:01,125 INFO spawnerr: unknown error making dispatchers for 'cache_00': EACCES
2024-03-11 13:46:01,125 INFO spawnerr: unknown error making dispatchers for 'cache_01': EACCES
2024-03-11 13:46:01,125 INFO spawnerr: unknown error making dispatchers for 'cache_02': EACCES
2024-03-11 13:46:01,125 INFO spawnerr: unknown error making dispatchers for 'cache_03': EACCES
2024-03-11 13:46:01,125 INFO spawnerr: unknown error making dispatchers for 'cache_04': EACCES
2024-03-11 13:46:01,125 INFO spawnerr: unknown error making dispatchers for 'prio_00': EACCES
2024-03-11 13:46:01,126 INFO spawnerr: unknown error making dispatchers for 'prio_01': EACCES
2024-03-11 13:46:01,126 INFO spawnerr: unknown error making dispatchers for 'prio_02': EACCES
2024-03-11 13:46:01,126 INFO spawnerr: unknown error making dispatchers for 'prio_03': EACCES
2024-03-11 13:46:01,126 INFO spawnerr: unknown error making dispatchers for 'prio_04': EACCES
2024-03-11 13:46:01,126 INFO spawnerr: unknown error making dispatchers for 'update_00': EACCES
2024-03-11 13:46:04,130 INFO spawnerr: unknown error making dispatchers for 'default_00': EACCES
2024-03-11 13:46:04,131 INFO gave up: default_00 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,131 INFO spawnerr: unknown error making dispatchers for 'default_01': EACCES
2024-03-11 13:46:04,131 INFO gave up: default_01 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,131 INFO spawnerr: unknown error making dispatchers for 'default_02': EACCES
2024-03-11 13:46:04,131 INFO gave up: default_02 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,131 INFO spawnerr: unknown error making dispatchers for 'default_03': EACCES
2024-03-11 13:46:04,131 INFO gave up: default_03 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,131 INFO spawnerr: unknown error making dispatchers for 'default_04': EACCES
2024-03-11 13:46:04,131 INFO gave up: default_04 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,131 INFO spawnerr: unknown error making dispatchers for 'email_00': EACCES
2024-03-11 13:46:04,131 INFO gave up: email_00 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,132 INFO spawnerr: unknown error making dispatchers for 'email_01': EACCES
2024-03-11 13:46:04,132 INFO gave up: email_01 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,132 INFO spawnerr: unknown error making dispatchers for 'email_02': EACCES
2024-03-11 13:46:04,132 INFO gave up: email_02 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,132 INFO spawnerr: unknown error making dispatchers for 'email_03': EACCES
2024-03-11 13:46:04,132 INFO gave up: email_03 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,132 INFO spawnerr: unknown error making dispatchers for 'email_04': EACCES
2024-03-11 13:46:04,132 INFO gave up: email_04 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,132 INFO spawnerr: unknown error making dispatchers for 'cache_00': EACCES
2024-03-11 13:46:04,132 INFO gave up: cache_00 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,133 INFO spawnerr: unknown error making dispatchers for 'cache_01': EACCES
2024-03-11 13:46:04,133 INFO gave up: cache_01 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,133 INFO spawnerr: unknown error making dispatchers for 'cache_02': EACCES
2024-03-11 13:46:04,133 INFO gave up: cache_02 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,133 INFO spawnerr: unknown error making dispatchers for 'cache_03': EACCES
2024-03-11 13:46:04,133 INFO gave up: cache_03 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,133 INFO spawnerr: unknown error making dispatchers for 'cache_04': EACCES
2024-03-11 13:46:04,133 INFO gave up: cache_04 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,133 INFO spawnerr: unknown error making dispatchers for 'prio_00': EACCES
2024-03-11 13:46:04,133 INFO gave up: prio_00 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,133 INFO spawnerr: unknown error making dispatchers for 'prio_01': EACCES
2024-03-11 13:46:04,133 INFO gave up: prio_01 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,134 INFO spawnerr: unknown error making dispatchers for 'prio_02': EACCES
2024-03-11 13:46:04,134 INFO gave up: prio_02 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,134 INFO spawnerr: unknown error making dispatchers for 'prio_03': EACCES
2024-03-11 13:46:04,134 INFO gave up: prio_03 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,134 INFO spawnerr: unknown error making dispatchers for 'prio_04': EACCES
2024-03-11 13:46:04,134 INFO gave up: prio_04 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,134 INFO spawnerr: unknown error making dispatchers for 'update_00': EACCES
2024-03-11 13:46:04,134 INFO gave up: update_00 entered FATAL state, too many start retries too quickly

Based on this, I found the following open issue: moby/moby#31243. I tried the remediation steps as mentioned in moby/moby#31243 (comment).

First, I added the --tty flag to the unit file:

[Unit]
Description=MISP core running inside of docker
After=network.target
[Service]
Type=simple
ExecStartPre=/bin/bash -c "cat /etc/image-pull-secret.txt | docker login -u xxyyzz --password-stdin git.aa.bb.cc:5050"
ExecStart=/usr/bin/docker run \
  --rm --tty \
  --name misp-core \
  --network host \
  --env-file /etc/misp/core.env \
  -v /etc/misp/config:/var/www/MISP/app/Config \
  -v /etc/misp/customization:/var/www/MISP/app/webroot/img/custom \
  -v /var/log/misp:/var/www/MISP/app/tmp/logs \
  -v /var/lib/misp:/var/www/MISP/app/files \
  -v /etc/misp/gnupg:/var/www/MISP/.gnupg \
  -v /etc/easytls/certs/misp.cert:/etc/nginx/certs/cert.pem \
  -v /etc/easytls/certs/misp.key:/etc/nginx/certs/key.pem \
  git.aa.bb.cc:5050/docker/misp-core:10
ExecStop=/usr/bin/docker kill misp-core
TimeoutStartSec=0
Restart=always
RestartSec=3s
[Install]
WantedBy=default.target

I combined this with the following command (inside of the container):

usermod -G tty www-data

And lo&behold, restarting the workers does indeed work:

root@misp01:/var/www/MISP# id www-data
uid=33(www-data) gid=33(www-data) groups=33(www-data),5(tty)
root@misp01:/var/www/MISP# supervisorctl restart misp-workers:*
misp-workers:default_02: stopped
misp-workers:email_00: stopped
misp-workers:email_01: stopped
misp-workers:email_03: stopped
misp-workers:email_04: stopped
misp-workers:cache_00: stopped
misp-workers:cache_01: stopped
misp-workers:cache_02: stopped
misp-workers:prio_00: stopped
misp-workers:prio_03: stopped
misp-workers:prio_04: stopped
misp-workers:update_00: stopped
misp-workers:default_00: stopped
misp-workers:default_01: stopped
misp-workers:default_03: stopped
misp-workers:default_04: stopped
misp-workers:email_02: stopped
misp-workers:cache_03: stopped
misp-workers:cache_04: stopped
misp-workers:prio_01: stopped
misp-workers:prio_02: stopped
misp-workers:default_00: started
misp-workers:default_01: started
misp-workers:default_02: started
misp-workers:default_03: started
misp-workers:default_04: started
misp-workers:email_00: started
misp-workers:email_01: started
misp-workers:email_02: started
misp-workers:email_03: started
misp-workers:email_04: started
misp-workers:cache_00: started
misp-workers:cache_01: started
misp-workers:cache_02: started
misp-workers:cache_03: started
misp-workers:cache_04: started
misp-workers:prio_00: started
misp-workers:prio_01: started
misp-workers:prio_02: started
misp-workers:prio_03: started
misp-workers:prio_04: started
misp-workers:update_00: started
root@misp01:/var/www/MISP#

So yay! I will leave this running for another 24h to see if the workaround also works unattended.

Question is now, how do you want to proceed? I can add this change to our local codebase and be done with it, or do you want to have this workaround in the misp container and should I wait for that?

@ostefano
Copy link
Collaborator

Is there anything we can do at supervisor-job's definition time to fix this (since it seems to be related to stdout and permissions)?

If you want to test the workaround without waiting 24 hours you can always add the --maxExecutionTime <seconds> option to the supervisor command (inside 50-workers.conf file).

@ostefano
Copy link
Collaborator

@r3boot have you tried updating docker/docker-compose using the official repositories btw?

@r3boot
Copy link
Author

r3boot commented Mar 11, 2024

Is there anything we can do at supervisor-job's definition time to fix this (since it seems to be related to stdout and permissions)?

Imo, a workaround would be two-fold:

  1. Update MISP/misp-docker/core/Dockerfile and add && usermod -G tty www-data at or around line 183
  2. Update the documentation with a remark to start the container with a tty IF someone is affected by /dev/stdout is inaccessible by an unprivileged user in a container started as root moby/moby#31243.

But, I can also see how this would not be accepted, since we dont run misp-docker in a supported way, I think?

If you want to test the workaround without waiting 24 hours you can always add the --maxExecutionTime <seconds> option to the supervisor command (inside 50-workers.conf file).

That doesnt seem to work. I did the following:

Modify 50-workers.conf:

root@misp01:/etc/supervisor/conf.d# grep ^command 50-workers.conf
command=/var/www/MISP/app/Console/cake start_worker default --maxExecutionTime=60
command=/var/www/MISP/app/Console/cake start_worker prio --maxExecutionTime=60
command=/var/www/MISP/app/Console/cake start_worker email --maxExecutionTime=60
command=/var/www/MISP/app/Console/cake start_worker update --maxExecutionTime=60
command=/var/www/MISP/app/Console/cake start_worker cache --maxExecutionTime=60

Reread the configuration using supervisorctl reread, followed by supervisorctl restart misp-workers:*.

Workers get restarted just fine, but dont stop after a minute:

root@misp01:/etc/supervisor/conf.d# date
Mon Mar 11 15:27:35 UTC 2024
root@misp01:/etc/supervisor/conf.d# tail -21 /var/log/supervisor/supervisord.log
2024-03-11 15:20:18,675 INFO success: default_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:18,676 INFO success: default_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:18,691 INFO success: default_02 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:18,691 INFO success: default_03 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:18,691 INFO success: default_04 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:18,713 INFO success: email_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:18,713 INFO success: email_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:18,713 INFO success: email_02 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:18,730 INFO success: email_03 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:18,730 INFO success: email_04 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:18,730 INFO success: cache_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:18,743 INFO success: cache_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:18,771 INFO success: cache_02 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:18,813 INFO success: cache_03 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:19,818 INFO success: cache_04 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:19,818 INFO success: prio_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:19,818 INFO success: prio_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:19,818 INFO success: prio_02 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:19,818 INFO success: prio_03 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:19,818 INFO success: prio_04 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:19,819 INFO success: update_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

@r3boot have you tried updating docker/docker-compose using the official repositories btw?

We run the latest docker from the official docker repo:

[dev][root@misp01:/root]↥ 17m27s 127 # docker -v
Docker version 25.0.4, build 1a576c5

@ostefano
Copy link
Collaborator

ostefano commented Mar 11, 2024

@r3boot try

command=/var/www/MISP/app/Console/cake start_worker cache --maxExecutionTime 60

instead

@ostefano
Copy link
Collaborator

We should definitely document the issue, and maybe suggest to run the usermod command by using a custom entrypoint (which is a documented way to run additional commands) since the command should be idempotent after all.

In other words, no image modification but documented and supported workaround.

@r3boot
Copy link
Author

r3boot commented Mar 11, 2024

--maxExecutionTime 60

Sharp! Tried it, doesnt work. I will leave the current dev instance running for 24h and look again. I'm pretty sure it will work, but the proof is in the pudding ofc :)

We should definitely document the issue, and maybe suggest to run the usermod command by using a custom entrypoint (which is a documented way to run additional commands) since the command should be idempotent after all.

In other words, no image modification but documented and supported workaround.

Ack. I will perform the required changes in our deployment code. Thanks a lot for your help!

One final question tho .. We are going to move our misp instances to k8s lateron this year (which is finally possible b/c misp-docker, yay!), and I would love to have some insight into how these containers are developed, and maybe even do some PR's of my own (if I find time). Are there any channels I can follow to get more into the loop? (IRC, mailinglist, etc)

@ostefano
Copy link
Collaborator

@r3boot feel free to join https://gitter.im/MISP/Docker

@r3boot
Copy link
Author

r3boot commented Mar 13, 2024

Checked the logs after applying the workaround:

2024-03-12 15:38:35,114 INFO exited: default_00 (exit status 0; expected)
2024-03-12 15:38:36,122 INFO spawned: 'default_00' with pid 4677
2024-03-12 15:38:37,239 INFO success: default_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:38,533 INFO exited: default_04 (exit status 0; expected)
2024-03-12 15:38:38,739 INFO spawned: 'default_04' with pid 4685
2024-03-12 15:38:38,764 INFO exited: default_02 (exit status 0; expected)
2024-03-12 15:38:38,765 INFO exited: default_03 (exit status 0; expected)
2024-03-12 15:38:38,781 INFO spawned: 'default_02' with pid 4686
2024-03-12 15:38:38,809 INFO spawned: 'default_03' with pid 4687
2024-03-12 15:38:38,823 INFO exited: email_02 (exit status 0; expected)
2024-03-12 15:38:38,831 INFO exited: default_01 (exit status 0; expected)
2024-03-12 15:38:38,838 INFO exited: email_04 (exit status 0; expected)
2024-03-12 15:38:38,839 INFO exited: cache_02 (exit status 0; expected)
2024-03-12 15:38:38,839 INFO exited: cache_03 (exit status 0; expected)
2024-03-12 15:38:38,840 INFO exited: email_03 (exit status 0; expected)
2024-03-12 15:38:38,841 INFO exited: cache_04 (exit status 0; expected)
2024-03-12 15:38:38,841 INFO exited: prio_02 (exit status 0; expected)
2024-03-12 15:38:38,842 INFO exited: prio_03 (exit status 0; expected)
2024-03-12 15:38:38,842 INFO exited: prio_04 (exit status 0; expected)
2024-03-12 15:38:38,844 INFO exited: update_00 (exit status 0; expected)
2024-03-12 15:38:38,845 INFO exited: cache_00 (exit status 0; expected)
2024-03-12 15:38:38,845 INFO exited: cache_01 (exit status 0; expected)
2024-03-12 15:38:38,849 INFO spawned: 'default_01' with pid 4688
2024-03-12 15:38:38,852 INFO spawned: 'email_02' with pid 4690
2024-03-12 15:38:38,860 INFO spawned: 'email_03' with pid 4694
2024-03-12 15:38:38,865 INFO spawned: 'email_04' with pid 4698
2024-03-12 15:38:38,867 INFO spawned: 'cache_00' with pid 4699
2024-03-12 15:38:38,870 INFO spawned: 'cache_01' with pid 4700
2024-03-12 15:38:38,886 INFO spawned: 'cache_02' with pid 4706
2024-03-12 15:38:38,888 INFO spawned: 'cache_03' with pid 4719
2024-03-12 15:38:38,893 INFO spawned: 'cache_04' with pid 4729
2024-03-12 15:38:38,903 INFO spawned: 'prio_02' with pid 4733
2024-03-12 15:38:38,909 INFO spawned: 'prio_03' with pid 4740
2024-03-12 15:38:38,914 INFO spawned: 'prio_04' with pid 4745
2024-03-12 15:38:38,923 INFO spawned: 'update_00' with pid 4750
2024-03-12 15:38:38,935 INFO exited: email_00 (exit status 0; expected)
2024-03-12 15:38:38,936 INFO exited: email_01 (exit status 0; expected)
2024-03-12 15:38:38,936 INFO exited: prio_00 (exit status 0; expected)
2024-03-12 15:38:38,937 INFO exited: prio_01 (exit status 0; expected)
2024-03-12 15:38:39,218 INFO spawned: 'email_00' with pid 4813
2024-03-12 15:38:39,223 INFO spawned: 'email_01' with pid 4814
2024-03-12 15:38:39,233 INFO spawned: 'prio_00' with pid 4815
2024-03-12 15:38:39,276 INFO spawned: 'prio_01' with pid 4816
2024-03-12 15:38:39,751 INFO success: default_04 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:39,775 INFO success: default_02 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:39,812 INFO success: default_03 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:39,859 INFO success: default_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:39,860 INFO success: email_02 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:39,861 INFO success: email_03 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:39,877 INFO success: email_04 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:39,878 INFO success: cache_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:39,878 INFO success: cache_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:39,879 INFO success: cache_02 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:40,883 INFO success: email_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:40,884 INFO success: email_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:40,884 INFO success: cache_03 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:40,884 INFO success: cache_04 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:40,884 INFO success: prio_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:40,884 INFO success: prio_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:40,884 INFO success: prio_02 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:40,884 INFO success: prio_03 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:40,884 INFO success: prio_04 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:40,884 INFO success: update_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

Closing issue :)

@r3boot feel free to join https://gitter.im/MISP/Docker

Tnx!

@r3boot r3boot closed this as completed Mar 13, 2024
@ostefano
Copy link
Collaborator

@r3boot can you document the issue and the fix inside the README.md and open a PR?

@r3boot r3boot reopened this Mar 13, 2024
@r3boot
Copy link
Author

r3boot commented Mar 13, 2024

Sure thing. See: #28

@r3boot
Copy link
Author

r3boot commented Mar 14, 2024

I spoke too soon.

2024-03-13 12:05:12,520 WARN For [program:nginx], redirect_stderr=true but stderr_logfile has also been set to a filename, the filename has been ignored
2024-03-13 12:05:12,520 WARN For [program:php-fpm], redirect_stderr=true but stderr_logfile has also been set to a filename, the filename has been ignored
2024-03-13 12:05:12,520 WARN For [program:cron], redirect_stderr=true but stderr_logfile has also been set to a filename, the filename has been ignored
2024-03-13 12:05:12,520 INFO Included extra file "/etc/supervisor/conf.d/10-supervisor.conf" during parsing
2024-03-13 12:05:12,520 INFO Included extra file "/etc/supervisor/conf.d/50-workers.conf" during parsing
2024-03-13 12:05:12,520 INFO Set uid to user 0 succeeded
2024-03-13 12:05:12,527 INFO RPC interface 'supervisor' initialized
2024-03-13 12:05:12,528 INFO RPC interface 'supervisor' initialized
2024-03-13 12:05:12,528 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2024-03-13 12:05:12,529 INFO supervisord started with pid 6
2024-03-13 12:05:13,531 INFO spawned: 'cron' with pid 7
2024-03-13 12:05:13,545 INFO spawned: 'nginx' with pid 8
2024-03-13 12:05:13,547 INFO spawned: 'php-fpm' with pid 9
2024-03-13 12:05:14,530 INFO success: cron entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:14,534 INFO success: nginx entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:14,548 INFO success: php-fpm entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:46,329 INFO spawned: 'default_00' with pid 1039
2024-03-13 12:05:46,331 INFO spawned: 'default_01' with pid 1040
2024-03-13 12:05:46,333 INFO spawned: 'default_02' with pid 1041
2024-03-13 12:05:46,335 INFO spawned: 'default_03' with pid 1042
2024-03-13 12:05:46,337 INFO spawned: 'default_04' with pid 1043
2024-03-13 12:05:46,344 INFO spawned: 'email_00' with pid 1045
2024-03-13 12:05:46,352 INFO spawned: 'email_01' with pid 1048
2024-03-13 12:05:46,360 INFO spawned: 'email_02' with pid 1055
2024-03-13 12:05:46,365 INFO spawned: 'email_03' with pid 1060
2024-03-13 12:05:46,371 INFO spawned: 'email_04' with pid 1067
2024-03-13 12:05:46,389 INFO spawned: 'cache_00' with pid 1087
2024-03-13 12:05:46,397 INFO spawned: 'cache_01' with pid 1091
2024-03-13 12:05:46,410 INFO spawned: 'cache_02' with pid 1093
2024-03-13 12:05:46,421 INFO spawned: 'cache_03' with pid 1107
2024-03-13 12:05:46,433 INFO spawned: 'cache_04' with pid 1112
2024-03-13 12:05:46,464 INFO spawned: 'prio_00' with pid 1122
2024-03-13 12:05:46,479 INFO spawned: 'prio_01' with pid 1127
2024-03-13 12:05:46,510 INFO spawned: 'prio_02' with pid 1132
2024-03-13 12:05:46,523 INFO spawned: 'prio_03' with pid 1139
2024-03-13 12:05:46,537 INFO spawned: 'prio_04' with pid 1145
2024-03-13 12:05:46,591 INFO spawned: 'update_00' with pid 1158
2024-03-13 12:05:47,335 INFO success: default_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:47,335 INFO success: default_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:47,335 INFO success: default_02 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:47,335 INFO success: default_03 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:47,350 INFO success: default_04 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:47,351 INFO success: email_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:47,351 INFO success: email_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:47,363 INFO success: email_02 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:47,379 INFO success: email_03 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:47,380 INFO success: email_04 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:47,391 INFO success: cache_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:47,415 INFO success: cache_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:47,416 INFO success: cache_02 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:47,423 INFO success: cache_03 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:47,435 INFO success: cache_04 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:48,436 INFO success: prio_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:48,436 INFO success: prio_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:48,436 INFO success: prio_02 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:48,436 INFO success: prio_03 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:48,436 INFO success: prio_04 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:48,436 INFO success: update_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-14 12:05:54,384 INFO exited: default_00 (exit status 0; expected)
2024-03-14 12:05:55,395 INFO spawnerr: unknown error making dispatchers for 'default_00': EACCES
2024-03-14 12:05:56,397 INFO spawnerr: unknown error making dispatchers for 'default_00': EACCES
2024-03-14 12:05:58,400 INFO spawnerr: unknown error making dispatchers for 'default_00': EACCES
2024-03-14 12:05:58,690 INFO exited: email_00 (exit status 0; expected)
2024-03-14 12:05:58,691 INFO exited: prio_02 (exit status 0; expected)
2024-03-14 12:05:58,693 INFO spawnerr: unknown error making dispatchers for 'email_00': EACCES
2024-03-14 12:05:58,694 INFO spawnerr: unknown error making dispatchers for 'prio_02': EACCES
2024-03-14 12:05:59,183 INFO exited: update_00 (exit status 0; expected)
2024-03-14 12:05:59,184 INFO spawnerr: unknown error making dispatchers for 'update_00': EACCES
2024-03-14 12:05:59,399 INFO exited: prio_03 (exit status 0; expected)
2024-03-14 12:05:59,400 INFO spawnerr: unknown error making dispatchers for 'prio_03': EACCES
2024-03-14 12:05:59,700 INFO spawnerr: unknown error making dispatchers for 'email_00': EACCES
2024-03-14 12:05:59,700 INFO spawnerr: unknown error making dispatchers for 'prio_02': EACCES
2024-03-14 12:05:59,700 INFO exited: cache_00 (exit status 0; expected)
2024-03-14 12:05:59,701 INFO spawnerr: unknown error making dispatchers for 'cache_00': EACCES
2024-03-14 12:06:00,705 INFO spawnerr: unknown error making dispatchers for 'cache_00': EACCES
2024-03-14 12:06:00,705 INFO spawnerr: unknown error making dispatchers for 'prio_03': EACCES
2024-03-14 12:06:00,705 INFO spawnerr: unknown error making dispatchers for 'update_00': EACCES
2024-03-14 12:06:00,705 INFO exited: cache_04 (exit status 0; expected)
2024-03-14 12:06:01,003 INFO spawnerr: unknown error making dispatchers for 'cache_04': EACCES
2024-03-14 12:06:01,127 INFO exited: cache_02 (exit status 0; expected)
2024-03-14 12:06:01,129 INFO spawnerr: unknown error making dispatchers for 'cache_02': EACCES
2024-03-14 12:06:01,300 INFO exited: default_03 (exit status 0; expected)
2024-03-14 12:06:01,301 INFO exited: email_02 (exit status 0; expected)
2024-03-14 12:06:01,302 INFO exited: email_03 (exit status 0; expected)
2024-03-14 12:06:01,302 INFO exited: email_04 (exit status 0; expected)
2024-03-14 12:06:01,303 INFO exited: cache_03 (exit status 0; expected)
2024-03-14 12:06:01,304 INFO spawnerr: unknown error making dispatchers for 'default_03': EACCES
2024-03-14 12:06:01,305 INFO spawnerr: unknown error making dispatchers for 'email_02': EACCES
2024-03-14 12:06:01,307 INFO spawnerr: unknown error making dispatchers for 'email_03': EACCES
2024-03-14 12:06:01,311 INFO spawnerr: unknown error making dispatchers for 'email_04': EACCES
2024-03-14 12:06:01,315 INFO spawnerr: unknown error making dispatchers for 'cache_03': EACCES
2024-03-14 12:06:02,320 INFO spawnerr: unknown error making dispatchers for 'default_00': EACCES
2024-03-14 12:06:02,321 INFO gave up: default_00 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:02,322 INFO spawnerr: unknown error making dispatchers for 'default_03': EACCES
2024-03-14 12:06:02,322 INFO spawnerr: unknown error making dispatchers for 'email_00': EACCES
2024-03-14 12:06:02,323 INFO spawnerr: unknown error making dispatchers for 'email_02': EACCES
2024-03-14 12:06:02,323 INFO spawnerr: unknown error making dispatchers for 'email_03': EACCES
2024-03-14 12:06:02,323 INFO spawnerr: unknown error making dispatchers for 'email_04': EACCES
2024-03-14 12:06:02,323 INFO spawnerr: unknown error making dispatchers for 'cache_02': EACCES
2024-03-14 12:06:02,324 INFO spawnerr: unknown error making dispatchers for 'cache_03': EACCES
2024-03-14 12:06:02,324 INFO spawnerr: unknown error making dispatchers for 'cache_04': EACCES
2024-03-14 12:06:02,324 INFO spawnerr: unknown error making dispatchers for 'prio_02': EACCES
2024-03-14 12:06:02,324 INFO exited: default_01 (exit status 0; expected)
2024-03-14 12:06:02,324 INFO exited: default_02 (exit status 0; expected)
2024-03-14 12:06:02,324 INFO exited: default_04 (exit status 0; expected)
2024-03-14 12:06:02,324 INFO exited: email_01 (exit status 0; expected)
2024-03-14 12:06:02,325 INFO exited: cache_01 (exit status 0; expected)
2024-03-14 12:06:02,325 INFO exited: prio_00 (exit status 0; expected)
2024-03-14 12:06:02,325 INFO exited: prio_01 (exit status 0; expected)
2024-03-14 12:06:02,325 INFO exited: prio_04 (exit status 0; expected)
2024-03-14 12:06:03,326 INFO spawnerr: unknown error making dispatchers for 'default_01': EACCES
2024-03-14 12:06:03,327 INFO spawnerr: unknown error making dispatchers for 'default_02': EACCES
2024-03-14 12:06:03,327 INFO spawnerr: unknown error making dispatchers for 'default_04': EACCES
2024-03-14 12:06:03,327 INFO spawnerr: unknown error making dispatchers for 'email_01': EACCES
2024-03-14 12:06:03,327 INFO spawnerr: unknown error making dispatchers for 'cache_00': EACCES
2024-03-14 12:06:03,327 INFO spawnerr: unknown error making dispatchers for 'cache_01': EACCES
2024-03-14 12:06:03,327 INFO spawnerr: unknown error making dispatchers for 'prio_00': EACCES
2024-03-14 12:06:03,328 INFO spawnerr: unknown error making dispatchers for 'prio_01': EACCES
2024-03-14 12:06:03,328 INFO spawnerr: unknown error making dispatchers for 'prio_03': EACCES
2024-03-14 12:06:03,328 INFO spawnerr: unknown error making dispatchers for 'prio_04': EACCES
2024-03-14 12:06:03,328 INFO spawnerr: unknown error making dispatchers for 'update_00': EACCES
2024-03-14 12:06:04,330 INFO spawnerr: unknown error making dispatchers for 'default_01': EACCES
2024-03-14 12:06:04,330 INFO spawnerr: unknown error making dispatchers for 'default_02': EACCES
2024-03-14 12:06:04,330 INFO spawnerr: unknown error making dispatchers for 'default_03': EACCES
2024-03-14 12:06:04,330 INFO spawnerr: unknown error making dispatchers for 'default_04': EACCES
2024-03-14 12:06:04,331 INFO spawnerr: unknown error making dispatchers for 'email_01': EACCES
2024-03-14 12:06:04,331 INFO spawnerr: unknown error making dispatchers for 'email_02': EACCES
2024-03-14 12:06:04,331 INFO spawnerr: unknown error making dispatchers for 'email_03': EACCES
2024-03-14 12:06:04,331 INFO spawnerr: unknown error making dispatchers for 'email_04': EACCES
2024-03-14 12:06:04,331 INFO spawnerr: unknown error making dispatchers for 'cache_01': EACCES
2024-03-14 12:06:04,332 INFO spawnerr: unknown error making dispatchers for 'cache_02': EACCES
2024-03-14 12:06:04,332 INFO spawnerr: unknown error making dispatchers for 'cache_03': EACCES
2024-03-14 12:06:04,332 INFO spawnerr: unknown error making dispatchers for 'cache_04': EACCES
2024-03-14 12:06:04,332 INFO spawnerr: unknown error making dispatchers for 'prio_00': EACCES
2024-03-14 12:06:04,332 INFO spawnerr: unknown error making dispatchers for 'prio_01': EACCES
2024-03-14 12:06:04,332 INFO spawnerr: unknown error making dispatchers for 'prio_04': EACCES
2024-03-14 12:06:05,334 INFO spawnerr: unknown error making dispatchers for 'email_00': EACCES
2024-03-14 12:06:05,334 INFO gave up: email_00 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:05,334 INFO spawnerr: unknown error making dispatchers for 'prio_02': EACCES
2024-03-14 12:06:05,334 INFO gave up: prio_02 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:06,335 INFO spawnerr: unknown error making dispatchers for 'default_01': EACCES
2024-03-14 12:06:06,335 INFO spawnerr: unknown error making dispatchers for 'default_02': EACCES
2024-03-14 12:06:06,335 INFO spawnerr: unknown error making dispatchers for 'default_04': EACCES
2024-03-14 12:06:06,335 INFO spawnerr: unknown error making dispatchers for 'email_01': EACCES
2024-03-14 12:06:06,336 INFO spawnerr: unknown error making dispatchers for 'cache_00': EACCES
2024-03-14 12:06:06,336 INFO gave up: cache_00 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:06,336 INFO spawnerr: unknown error making dispatchers for 'cache_01': EACCES
2024-03-14 12:06:06,336 INFO spawnerr: unknown error making dispatchers for 'prio_00': EACCES
2024-03-14 12:06:06,336 INFO spawnerr: unknown error making dispatchers for 'prio_01': EACCES
2024-03-14 12:06:06,336 INFO spawnerr: unknown error making dispatchers for 'prio_03': EACCES
2024-03-14 12:06:06,336 INFO gave up: prio_03 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:06,336 INFO spawnerr: unknown error making dispatchers for 'prio_04': EACCES
2024-03-14 12:06:06,336 INFO spawnerr: unknown error making dispatchers for 'update_00': EACCES
2024-03-14 12:06:06,337 INFO gave up: update_00 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:07,338 INFO spawnerr: unknown error making dispatchers for 'default_03': EACCES
2024-03-14 12:06:07,338 INFO gave up: default_03 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:07,338 INFO spawnerr: unknown error making dispatchers for 'email_02': EACCES
2024-03-14 12:06:07,338 INFO gave up: email_02 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:07,339 INFO spawnerr: unknown error making dispatchers for 'email_03': EACCES
2024-03-14 12:06:07,339 INFO gave up: email_03 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:07,339 INFO spawnerr: unknown error making dispatchers for 'email_04': EACCES
2024-03-14 12:06:07,339 INFO gave up: email_04 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:07,339 INFO spawnerr: unknown error making dispatchers for 'cache_02': EACCES
2024-03-14 12:06:07,339 INFO gave up: cache_02 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:07,339 INFO spawnerr: unknown error making dispatchers for 'cache_03': EACCES
2024-03-14 12:06:07,339 INFO gave up: cache_03 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:07,339 INFO spawnerr: unknown error making dispatchers for 'cache_04': EACCES
2024-03-14 12:06:07,339 INFO gave up: cache_04 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:09,342 INFO spawnerr: unknown error making dispatchers for 'default_01': EACCES
2024-03-14 12:06:09,342 INFO gave up: default_01 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:09,342 INFO spawnerr: unknown error making dispatchers for 'default_02': EACCES
2024-03-14 12:06:09,342 INFO gave up: default_02 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:09,343 INFO spawnerr: unknown error making dispatchers for 'default_04': EACCES
2024-03-14 12:06:09,343 INFO gave up: default_04 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:09,343 INFO spawnerr: unknown error making dispatchers for 'email_01': EACCES
2024-03-14 12:06:09,343 INFO gave up: email_01 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:09,343 INFO spawnerr: unknown error making dispatchers for 'cache_01': EACCES
2024-03-14 12:06:09,343 INFO gave up: cache_01 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:09,343 INFO spawnerr: unknown error making dispatchers for 'prio_00': EACCES
2024-03-14 12:06:09,343 INFO gave up: prio_00 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:09,344 INFO spawnerr: unknown error making dispatchers for 'prio_01': EACCES
2024-03-14 12:06:09,344 INFO gave up: prio_01 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:09,344 INFO spawnerr: unknown error making dispatchers for 'prio_04': EACCES
2024-03-14 12:06:09,344 INFO gave up: prio_04 entered FATAL state, too many start retries too quickly

I will do some more investigation and get back to this issue.

@r3boot
Copy link
Author

r3boot commented Mar 14, 2024

Captured the EACCES using strace:

373645 openat(AT_FDCWD, "/var/www/MISP/app/tmp/logs/misp-workers.log", O_WRONLY|O_CREAT|O_APPEND|O_CLOEXEC, 0666) = -1 EACCES (Permission denied)

@ostefano
Copy link
Collaborator

Ha interesting, looks like the host changed the permissions to that file (it's created by MISP itself)

@r3boot
Copy link
Author

r3boot commented Mar 14, 2024

Found it!

Ok, so, we create our misp machines using saltstack. I configured the /var/log/misp directory with the sticky bit (1770, root:www-data). This directory gets volume mounted into misp-core at /var/www/MISP/app/tmp/logs. Next, systemd started misp-core, and during startup, the permissions of /var/www/MISP/app/tmp/ gets changed:

Mar 14 18:37:52 misp01 docker[422297]: ... chown -R www-data:www-data /var/www/MISP/app/tmp
Mar 14 18:37:52 misp01 docker[422297]: ... chmod -R 0550 files /var/www/MISP/app/tmp
Mar 14 18:37:52 misp01 docker[422297]: ... chmod -R 0770 directories /var/www/MISP/app/tmp

Next, all filedescriptors are opened, and all is good. One hour later, saltstack runs again, and this resets the permissions on /var/log/misp, and re-applies the sticky bit. And this works, until supervisord restarts the workers, and the workers need to re-open their logfiles, and since www-data is not the owner of the log directory, it gets EACCES.

Yesterday, when I had this working, I had disabled the salt agent, and so the permissions did not get reset. This led to me making the assumption that all was good, so I applied the fix, re-enabled salt, and went on with my life. Until I looked at the logs today.

So yeah, I guess you could file this one under a PEBKAC.... 🥇 I do have an opinion about misp setting the permissions, but thats outside of the scope of this issue. I will rework the documentation PR with this in mind. Tnx for the patience :)

@ostefano ostefano added the documentation Improvements or additions to documentation label Mar 24, 2024
@ostefano ostefano changed the title Workers not being restarted once max execution time is reached Permission issues when using SaltStack lead to workers failing to restart May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants