Permission issues when using SaltStack lead to workers failing to restart #25

r3boot · 2024-03-11T12:53:33Z

We currently are evaluating the misp-{docker,modules} v2.4.186 containers to replace our existing misp installation (based on the coolacid containers), and we are running into an issue with the workers.

Observed behaviour

The container boots, and fires up the workers, all is good:

2024-03-06 14:03:26 Info: [WORKER PID: 1036][default] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1043][email] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1049][email] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1038][default] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1039][default] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1045][email] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1041][default] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1037][default] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1069][cache] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1055][email] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1098][prio] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1066][cache] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1059][email] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1063][cache] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1075][cache] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1091][prio] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1100][prio] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1095][prio] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1111][prio] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1082][cache] - starting to process background jobs...
2024-03-06 14:03:26 Info: [WORKER PID: 1134][update] - starting to process background jobs...

24 hours later, the workers are gracefully killed:

2024-03-07 14:03:50 Info: [WORKER PID: 1036][default] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1043][email] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1049][email] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1038][default] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1039][default] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1037][default] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1041][default] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1045][email] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1069][cache] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1063][cache] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1098][prio] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1091][prio] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1055][email] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1059][email] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1066][cache] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1100][prio] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1111][prio] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1134][update] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1075][cache] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1082][cache] - worker max execution time reached, exiting gracefully worker...
2024-03-07 14:03:51 Info: [WORKER PID: 1095][prio] - worker max execution time reached, exiting gracefully worker...

However, the workers are not started again, and we need to restart the container to get back the workers.

Expected behaviour:

The misp-core container manages its workers without any intervention OR a procedure is provided on how to restart the workers.

Configuration

We use a modified setup based on the docker-compose setup, but by using systemd unit files. The containers are rebuilt using the following Dockerfile:

FROM ghcr.io/misp/misp-docker/misp-core:v2.4.186

RUN \
    mkdir -p /var/www/MISP/app/files.dist/misp-objects/objects/phish-healthcare && \
    apt update && \
    apt -qqy install less && \
    apt clean

COPY files/phish-healthcare.json /var/www/MISP/app/files.dist/misp-objects/objects/phish-healthcare/definition.json

The unit file we use to start this container is as follows:

[Unit]
Description=MISP core running inside of docker
After=network.target
[Service]
Type=simple
ExecStartPre=/bin/bash -c "cat /etc/image-pull-secret.txt | docker login -u xxyyzz --password-stdin aa.bb.cc:5050"
ExecStart=/usr/bin/docker run \
  --rm \
  --name misp-core \
  --network host \
  --env-file /etc/misp/core.env \
  -v /etc/misp/config:/var/www/MISP/app/Config \
  -v /etc/misp/customization:/var/www/MISP/app/webroot/img/custom \
  -v /var/log/misp:/var/www/MISP/app/tmp/logs \
  -v /var/lib/misp:/var/www/MISP/app/files \
  -v /etc/misp/gnupg:/var/www/MISP/.gnupg \
  -v /etc/easytls/certs/misp.cert:/etc/nginx/certs/cert.pem \
  -v /etc/easytls/certs/misp.key:/etc/nginx/certs/key.pem \
  aa.bb.cc:5050/docker/misp-core:10
ExecStop=/usr/bin/docker kill misp-core
TimeoutStartSec=0
Restart=always
RestartSec=3s
[Install]
WantedBy=default.target

The corresponding env file contains the following:

ADMIN_EMAIL=admin@admin.test
ADMIN_ORG=AAABBBCCC
ADMIN_KEY=
ADMIN_PASSWORD=secret
GPG_PASSPHRASE=passphrase
CRON_USER_ID=1
BASE_URL=https://misp01.aa.bb.cc
SMARTHOST_ADDRESS=mail01.aa.bb.cc
MISP_EMAIL=admin@admin.test
MISP_CONTACT=admin@admin.test
AUTOCONF_GPG=False
AUTOGEN_ADMIN_KEY=True
DISABLE_IPV6=True
MYSQL_HOST=mysql.aa.bb.cc
MYSQL_PORT=3306
MYSQL_USER=misp
MYSQL_PASSWORD=secret
MYSQL_DATABASE=misp
REDIS_FQDN=localhost

The text was updated successfully, but these errors were encountered:

ostefano · 2024-03-11T13:12:32Z

Interesting issue, and I wonder whether the issue is related to MISP itself and its new way of managing background workers. Next time they get stuck can you open a shell inside the container and see if supervisorctl restart misp-workers:* restarts the workers? Note: workers should restart automatically because there is 'autorestart=true' inside 50-workers.cong.

@iglocska is it expected that workers are killed after a while? See 'worker max execution time reached' in the logs. If that is expected, is it possible to reduce the maximum execution time to reproduce the issue more easily?

r3boot · 2024-03-11T13:21:55Z

Sure thing! I have a dev instance running for exactly this purpose:

[dev][root@misp01:/root]↥ # docker ps -a
CONTAINER ID   IMAGE                                            COMMAND                  CREATED      STATUS      PORTS     NAMES
7dd975a6f8d7   git.aa.bb.cc:5050/docker/misp-core:10     "/entrypoint.sh"         4 days ago   Up 4 days             misp-core
1089178ba607   git.aa.bb.cc:5050/docker/misp-modules:6   "/usr/local/bin/misp…"   5 days ago   Up 5 days             gallant_dubinsky
866b066cdf4c   git.aa.bb.cc:5050/docker/redis:1          "docker-entrypoint.s…"   9 days ago   Up 9 days             amazing_bartik
[dev][root@misp01:/root]↥ # docker exec -it 7dd975a6f8d7 bash
root@misp01:/var/www/MISP# supervisorctl restart misp-workers:*
misp-workers:default_00: ERROR (spawn error)
misp-workers:default_01: ERROR (spawn error)
misp-workers:default_02: ERROR (spawn error)
misp-workers:default_03: ERROR (spawn error)
misp-workers:default_04: ERROR (spawn error)
misp-workers:email_00: ERROR (spawn error)
misp-workers:email_01: ERROR (spawn error)
misp-workers:email_02: ERROR (spawn error)
misp-workers:email_03: ERROR (spawn error)
misp-workers:email_04: ERROR (spawn error)
misp-workers:cache_00: ERROR (spawn error)
misp-workers:cache_01: ERROR (spawn error)
misp-workers:cache_02: ERROR (spawn error)
misp-workers:cache_03: ERROR (spawn error)
misp-workers:cache_04: ERROR (spawn error)
misp-workers:prio_00: ERROR (spawn error)
misp-workers:prio_01: ERROR (spawn error)
misp-workers:prio_02: ERROR (spawn error)
misp-workers:prio_03: ERROR (spawn error)
misp-workers:prio_04: ERROR (spawn error)
misp-workers:update_00: ERROR (spawn error)

No logging has been generated for this event, so I cannot see why this spawn error is happening.

I also doublechecked the configuration inside of the container, and we have autorestart=true for all workers:

root@misp01:/etc/supervisor/conf.d# grep restart 50-workers.conf
autorestart=true
autorestart=true
autorestart=true
autorestart=true
autorestart=true

EDIT: Note, if I start one of the workers by hand, it does work:

root@misp01:/etc/supervisor/conf.d# sudo -u www-data /var/www/MISP/app/Console/cake start_worker cache
2024-03-11 13:22:55 Info: [WORKER PID: 12094][cache] - starting to process background jobs...

ostefano · 2024-03-11T13:39:37Z

@r3boot did some tests by forcing the workers to auto-kill themselves after 10 seconds.

Everything looks good here:

misp-core-1     | 2024-03-11 13:37:12,078 INFO exited: cache_03 (exit status 0; expected)
misp-core-1     | 2024-03-11 13:37:12,088 INFO spawned: 'cache_03' with pid 1908
misp-core-1     | 2024-03-11 13:37:12,089 INFO exited: cache_00 (exit status 0; expected)
misp-core-1     | 2024-03-11 13:37:12,089 INFO exited: cache_01 (exit status 0; expected)
misp-core-1     | 2024-03-11 13:37:12,089 INFO exited: cache_02 (exit status 0; expected)
misp-core-1     | 2024-03-11 13:37:12,090 INFO exited: cache_04 (exit status 0; expected)
misp-core-1     | 2024-03-11 13:37:12,094 INFO spawned: 'cache_00' with pid 1909
misp-core-1     | 2024-03-11 13:37:12,097 INFO spawned: 'cache_01' with pid 1910
misp-core-1     | 2024-03-11 13:37:12,099 INFO spawned: 'cache_02' with pid 1911
misp-core-1     | 2024-03-11 13:37:12,102 INFO spawned: 'cache_04' with pid 1916
misp-core-1     | 2024-03-11 13:37:13,165 INFO success: cache_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
misp-core-1     | 2024-03-11 13:37:13,166 INFO success: cache_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
misp-core-1     | 2024-03-11 13:37:13,166 INFO success: cache_02 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
misp-core-1     | 2024-03-11 13:37:13,166 INFO success: cache_03 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
misp-core-1     | 2024-03-11 13:37:13,166 INFO success: cache_04 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
misp-core-1     | 2024-03-11 13:37:42,203 INFO exited: cache_00 (exit status 0; expected)
misp-core-1     | 2024-03-11 13:37:42,215 INFO spawned: 'cache_00' with pid 1948
misp-core-1     | 2024-03-11 13:37:42,216 INFO exited: cache_03 (exit status 0; expected)
misp-core-1     | 2024-03-11 13:37:42,217 INFO exited: cache_01 (exit status 0; expected)
misp-core-1     | 2024-03-11 13:37:42,217 INFO exited: cache_02 (exit status 0; expected)
misp-core-1     | 2024-03-11 13:37:42,217 INFO exited: cache_04 (exit status 0; expected)
misp-core-1     | 2024-03-11 13:37:42,222 INFO spawned: 'cache_01' with pid 1950
misp-core-1     | 2024-03-11 13:37:42,224 INFO spawned: 'cache_02' with pid 1953
misp-core-1     | 2024-03-11 13:37:42,226 INFO spawned: 'cache_03' with pid 1958
misp-core-1     | 2024-03-11 13:37:42,230 INFO spawned: 'cache_04' with pid 1965
misp-core-1     | 2024-03-11 13:37:43,283 INFO success: cache_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
misp-core-1     | 2024-03-11 13:37:43,283 INFO success: cache_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
misp-core-1     | 2024-03-11 13:37:43,283 INFO success: cache_02 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
misp-core-1     | 2024-03-11 13:37:43,283 INFO success: cache_03 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
misp-core-1     | 2024-03-11 13:37:43,283 INFO success: cache_04 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

I have also tried supervisorctl restart misp-workers:* and the commands succeeds here with no errors.
I wonder whether we might be hitting some system limits.

In conclusion, I have no further suggestion but to investigate further why supervisorctl restart misp-workers:* does not work. That might give you some hints re the underlying cause.

r3boot · 2024-03-11T14:06:14Z

Diving into this deeper, I found the following error in /var/log/supervisor/supervisord.log:

2024-03-11 13:45:58,097 INFO spawnerr: unknown error making dispatchers for 'default_00': EACCES
2024-03-11 13:45:58,098 INFO spawnerr: unknown error making dispatchers for 'default_01': EACCES
2024-03-11 13:45:58,098 INFO spawnerr: unknown error making dispatchers for 'default_02': EACCES
2024-03-11 13:45:58,099 INFO spawnerr: unknown error making dispatchers for 'default_03': EACCES
2024-03-11 13:45:58,099 INFO spawnerr: unknown error making dispatchers for 'default_04': EACCES
2024-03-11 13:45:58,100 INFO spawnerr: unknown error making dispatchers for 'email_00': EACCES
2024-03-11 13:45:58,100 INFO spawnerr: unknown error making dispatchers for 'email_01': EACCES
2024-03-11 13:45:58,102 INFO spawnerr: unknown error making dispatchers for 'email_02': EACCES
2024-03-11 13:45:58,103 INFO spawnerr: unknown error making dispatchers for 'email_03': EACCES
2024-03-11 13:45:58,103 INFO spawnerr: unknown error making dispatchers for 'email_04': EACCES
2024-03-11 13:45:58,104 INFO spawnerr: unknown error making dispatchers for 'cache_00': EACCES
2024-03-11 13:45:58,104 INFO spawnerr: unknown error making dispatchers for 'cache_01': EACCES
2024-03-11 13:45:58,105 INFO spawnerr: unknown error making dispatchers for 'cache_02': EACCES
2024-03-11 13:45:58,107 INFO spawnerr: unknown error making dispatchers for 'cache_03': EACCES
2024-03-11 13:45:58,108 INFO spawnerr: unknown error making dispatchers for 'cache_04': EACCES
2024-03-11 13:45:58,108 INFO spawnerr: unknown error making dispatchers for 'prio_00': EACCES
2024-03-11 13:45:58,109 INFO spawnerr: unknown error making dispatchers for 'prio_01': EACCES
2024-03-11 13:45:58,109 INFO spawnerr: unknown error making dispatchers for 'prio_02': EACCES
2024-03-11 13:45:58,110 INFO spawnerr: unknown error making dispatchers for 'prio_03': EACCES
2024-03-11 13:45:58,110 INFO spawnerr: unknown error making dispatchers for 'prio_04': EACCES
2024-03-11 13:45:58,111 INFO spawnerr: unknown error making dispatchers for 'update_00': EACCES
2024-03-11 13:45:59,114 INFO spawnerr: unknown error making dispatchers for 'default_00': EACCES
2024-03-11 13:45:59,115 INFO spawnerr: unknown error making dispatchers for 'default_01': EACCES
2024-03-11 13:45:59,115 INFO spawnerr: unknown error making dispatchers for 'default_02': EACCES
2024-03-11 13:45:59,115 INFO spawnerr: unknown error making dispatchers for 'default_03': EACCES
2024-03-11 13:45:59,116 INFO spawnerr: unknown error making dispatchers for 'default_04': EACCES
2024-03-11 13:45:59,116 INFO spawnerr: unknown error making dispatchers for 'email_00': EACCES
2024-03-11 13:45:59,116 INFO spawnerr: unknown error making dispatchers for 'email_01': EACCES
2024-03-11 13:45:59,117 INFO spawnerr: unknown error making dispatchers for 'email_02': EACCES
2024-03-11 13:45:59,117 INFO spawnerr: unknown error making dispatchers for 'email_03': EACCES
2024-03-11 13:45:59,117 INFO spawnerr: unknown error making dispatchers for 'email_04': EACCES
2024-03-11 13:45:59,117 INFO spawnerr: unknown error making dispatchers for 'cache_00': EACCES
2024-03-11 13:45:59,117 INFO spawnerr: unknown error making dispatchers for 'cache_01': EACCES
2024-03-11 13:45:59,118 INFO spawnerr: unknown error making dispatchers for 'cache_02': EACCES
2024-03-11 13:45:59,118 INFO spawnerr: unknown error making dispatchers for 'cache_03': EACCES
2024-03-11 13:45:59,118 INFO spawnerr: unknown error making dispatchers for 'cache_04': EACCES
2024-03-11 13:45:59,118 INFO spawnerr: unknown error making dispatchers for 'prio_00': EACCES
2024-03-11 13:45:59,119 INFO spawnerr: unknown error making dispatchers for 'prio_01': EACCES
2024-03-11 13:45:59,119 INFO spawnerr: unknown error making dispatchers for 'prio_02': EACCES
2024-03-11 13:45:59,119 INFO spawnerr: unknown error making dispatchers for 'prio_03': EACCES
2024-03-11 13:45:59,119 INFO spawnerr: unknown error making dispatchers for 'prio_04': EACCES
2024-03-11 13:45:59,120 INFO spawnerr: unknown error making dispatchers for 'update_00': EACCES
2024-03-11 13:46:01,123 INFO spawnerr: unknown error making dispatchers for 'default_00': EACCES
2024-03-11 13:46:01,123 INFO spawnerr: unknown error making dispatchers for 'default_01': EACCES
2024-03-11 13:46:01,123 INFO spawnerr: unknown error making dispatchers for 'default_02': EACCES
2024-03-11 13:46:01,124 INFO spawnerr: unknown error making dispatchers for 'default_03': EACCES
2024-03-11 13:46:01,124 INFO spawnerr: unknown error making dispatchers for 'default_04': EACCES
2024-03-11 13:46:01,124 INFO spawnerr: unknown error making dispatchers for 'email_00': EACCES
2024-03-11 13:46:01,124 INFO spawnerr: unknown error making dispatchers for 'email_01': EACCES
2024-03-11 13:46:01,124 INFO spawnerr: unknown error making dispatchers for 'email_02': EACCES
2024-03-11 13:46:01,124 INFO spawnerr: unknown error making dispatchers for 'email_03': EACCES
2024-03-11 13:46:01,125 INFO spawnerr: unknown error making dispatchers for 'email_04': EACCES
2024-03-11 13:46:01,125 INFO spawnerr: unknown error making dispatchers for 'cache_00': EACCES
2024-03-11 13:46:01,125 INFO spawnerr: unknown error making dispatchers for 'cache_01': EACCES
2024-03-11 13:46:01,125 INFO spawnerr: unknown error making dispatchers for 'cache_02': EACCES
2024-03-11 13:46:01,125 INFO spawnerr: unknown error making dispatchers for 'cache_03': EACCES
2024-03-11 13:46:01,125 INFO spawnerr: unknown error making dispatchers for 'cache_04': EACCES
2024-03-11 13:46:01,125 INFO spawnerr: unknown error making dispatchers for 'prio_00': EACCES
2024-03-11 13:46:01,126 INFO spawnerr: unknown error making dispatchers for 'prio_01': EACCES
2024-03-11 13:46:01,126 INFO spawnerr: unknown error making dispatchers for 'prio_02': EACCES
2024-03-11 13:46:01,126 INFO spawnerr: unknown error making dispatchers for 'prio_03': EACCES
2024-03-11 13:46:01,126 INFO spawnerr: unknown error making dispatchers for 'prio_04': EACCES
2024-03-11 13:46:01,126 INFO spawnerr: unknown error making dispatchers for 'update_00': EACCES
2024-03-11 13:46:04,130 INFO spawnerr: unknown error making dispatchers for 'default_00': EACCES
2024-03-11 13:46:04,131 INFO gave up: default_00 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,131 INFO spawnerr: unknown error making dispatchers for 'default_01': EACCES
2024-03-11 13:46:04,131 INFO gave up: default_01 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,131 INFO spawnerr: unknown error making dispatchers for 'default_02': EACCES
2024-03-11 13:46:04,131 INFO gave up: default_02 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,131 INFO spawnerr: unknown error making dispatchers for 'default_03': EACCES
2024-03-11 13:46:04,131 INFO gave up: default_03 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,131 INFO spawnerr: unknown error making dispatchers for 'default_04': EACCES
2024-03-11 13:46:04,131 INFO gave up: default_04 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,131 INFO spawnerr: unknown error making dispatchers for 'email_00': EACCES
2024-03-11 13:46:04,131 INFO gave up: email_00 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,132 INFO spawnerr: unknown error making dispatchers for 'email_01': EACCES
2024-03-11 13:46:04,132 INFO gave up: email_01 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,132 INFO spawnerr: unknown error making dispatchers for 'email_02': EACCES
2024-03-11 13:46:04,132 INFO gave up: email_02 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,132 INFO spawnerr: unknown error making dispatchers for 'email_03': EACCES
2024-03-11 13:46:04,132 INFO gave up: email_03 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,132 INFO spawnerr: unknown error making dispatchers for 'email_04': EACCES
2024-03-11 13:46:04,132 INFO gave up: email_04 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,132 INFO spawnerr: unknown error making dispatchers for 'cache_00': EACCES
2024-03-11 13:46:04,132 INFO gave up: cache_00 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,133 INFO spawnerr: unknown error making dispatchers for 'cache_01': EACCES
2024-03-11 13:46:04,133 INFO gave up: cache_01 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,133 INFO spawnerr: unknown error making dispatchers for 'cache_02': EACCES
2024-03-11 13:46:04,133 INFO gave up: cache_02 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,133 INFO spawnerr: unknown error making dispatchers for 'cache_03': EACCES
2024-03-11 13:46:04,133 INFO gave up: cache_03 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,133 INFO spawnerr: unknown error making dispatchers for 'cache_04': EACCES
2024-03-11 13:46:04,133 INFO gave up: cache_04 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,133 INFO spawnerr: unknown error making dispatchers for 'prio_00': EACCES
2024-03-11 13:46:04,133 INFO gave up: prio_00 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,133 INFO spawnerr: unknown error making dispatchers for 'prio_01': EACCES
2024-03-11 13:46:04,133 INFO gave up: prio_01 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,134 INFO spawnerr: unknown error making dispatchers for 'prio_02': EACCES
2024-03-11 13:46:04,134 INFO gave up: prio_02 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,134 INFO spawnerr: unknown error making dispatchers for 'prio_03': EACCES
2024-03-11 13:46:04,134 INFO gave up: prio_03 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,134 INFO spawnerr: unknown error making dispatchers for 'prio_04': EACCES
2024-03-11 13:46:04,134 INFO gave up: prio_04 entered FATAL state, too many start retries too quickly
2024-03-11 13:46:04,134 INFO spawnerr: unknown error making dispatchers for 'update_00': EACCES
2024-03-11 13:46:04,134 INFO gave up: update_00 entered FATAL state, too many start retries too quickly

Based on this, I found the following open issue: moby/moby#31243. I tried the remediation steps as mentioned in moby/moby#31243 (comment).

First, I added the --tty flag to the unit file:

[Unit]
Description=MISP core running inside of docker
After=network.target
[Service]
Type=simple
ExecStartPre=/bin/bash -c "cat /etc/image-pull-secret.txt | docker login -u xxyyzz --password-stdin git.aa.bb.cc:5050"
ExecStart=/usr/bin/docker run \
  --rm --tty \
  --name misp-core \
  --network host \
  --env-file /etc/misp/core.env \
  -v /etc/misp/config:/var/www/MISP/app/Config \
  -v /etc/misp/customization:/var/www/MISP/app/webroot/img/custom \
  -v /var/log/misp:/var/www/MISP/app/tmp/logs \
  -v /var/lib/misp:/var/www/MISP/app/files \
  -v /etc/misp/gnupg:/var/www/MISP/.gnupg \
  -v /etc/easytls/certs/misp.cert:/etc/nginx/certs/cert.pem \
  -v /etc/easytls/certs/misp.key:/etc/nginx/certs/key.pem \
  git.aa.bb.cc:5050/docker/misp-core:10
ExecStop=/usr/bin/docker kill misp-core
TimeoutStartSec=0
Restart=always
RestartSec=3s
[Install]
WantedBy=default.target

I combined this with the following command (inside of the container):

usermod -G tty www-data

And lo&behold, restarting the workers does indeed work:

root@misp01:/var/www/MISP# id www-data
uid=33(www-data) gid=33(www-data) groups=33(www-data),5(tty)
root@misp01:/var/www/MISP# supervisorctl restart misp-workers:*
misp-workers:default_02: stopped
misp-workers:email_00: stopped
misp-workers:email_01: stopped
misp-workers:email_03: stopped
misp-workers:email_04: stopped
misp-workers:cache_00: stopped
misp-workers:cache_01: stopped
misp-workers:cache_02: stopped
misp-workers:prio_00: stopped
misp-workers:prio_03: stopped
misp-workers:prio_04: stopped
misp-workers:update_00: stopped
misp-workers:default_00: stopped
misp-workers:default_01: stopped
misp-workers:default_03: stopped
misp-workers:default_04: stopped
misp-workers:email_02: stopped
misp-workers:cache_03: stopped
misp-workers:cache_04: stopped
misp-workers:prio_01: stopped
misp-workers:prio_02: stopped
misp-workers:default_00: started
misp-workers:default_01: started
misp-workers:default_02: started
misp-workers:default_03: started
misp-workers:default_04: started
misp-workers:email_00: started
misp-workers:email_01: started
misp-workers:email_02: started
misp-workers:email_03: started
misp-workers:email_04: started
misp-workers:cache_00: started
misp-workers:cache_01: started
misp-workers:cache_02: started
misp-workers:cache_03: started
misp-workers:cache_04: started
misp-workers:prio_00: started
misp-workers:prio_01: started
misp-workers:prio_02: started
misp-workers:prio_03: started
misp-workers:prio_04: started
misp-workers:update_00: started
root@misp01:/var/www/MISP#

So yay! I will leave this running for another 24h to see if the workaround also works unattended.

Question is now, how do you want to proceed? I can add this change to our local codebase and be done with it, or do you want to have this workaround in the misp container and should I wait for that?

ostefano · 2024-03-11T14:18:03Z

Is there anything we can do at supervisor-job's definition time to fix this (since it seems to be related to stdout and permissions)?

If you want to test the workaround without waiting 24 hours you can always add the --maxExecutionTime <seconds> option to the supervisor command (inside 50-workers.conf file).

ostefano · 2024-03-11T14:24:00Z

@r3boot have you tried updating docker/docker-compose using the official repositories btw?

r3boot · 2024-03-11T15:33:22Z

Is there anything we can do at supervisor-job's definition time to fix this (since it seems to be related to stdout and permissions)?

Imo, a workaround would be two-fold:

Update MISP/misp-docker/core/Dockerfile and add && usermod -G tty www-data at or around line 183
Update the documentation with a remark to start the container with a tty IF someone is affected by /dev/stdout is inaccessible by an unprivileged user in a container started as root moby/moby#31243.

But, I can also see how this would not be accepted, since we dont run misp-docker in a supported way, I think?

If you want to test the workaround without waiting 24 hours you can always add the --maxExecutionTime <seconds> option to the supervisor command (inside 50-workers.conf file).

That doesnt seem to work. I did the following:

Modify 50-workers.conf:

root@misp01:/etc/supervisor/conf.d# grep ^command 50-workers.conf
command=/var/www/MISP/app/Console/cake start_worker default --maxExecutionTime=60
command=/var/www/MISP/app/Console/cake start_worker prio --maxExecutionTime=60
command=/var/www/MISP/app/Console/cake start_worker email --maxExecutionTime=60
command=/var/www/MISP/app/Console/cake start_worker update --maxExecutionTime=60
command=/var/www/MISP/app/Console/cake start_worker cache --maxExecutionTime=60

Reread the configuration using supervisorctl reread, followed by supervisorctl restart misp-workers:*.

Workers get restarted just fine, but dont stop after a minute:

root@misp01:/etc/supervisor/conf.d# date
Mon Mar 11 15:27:35 UTC 2024
root@misp01:/etc/supervisor/conf.d# tail -21 /var/log/supervisor/supervisord.log
2024-03-11 15:20:18,675 INFO success: default_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:18,676 INFO success: default_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:18,691 INFO success: default_02 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:18,691 INFO success: default_03 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:18,691 INFO success: default_04 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:18,713 INFO success: email_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:18,713 INFO success: email_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:18,713 INFO success: email_02 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:18,730 INFO success: email_03 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:18,730 INFO success: email_04 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:18,730 INFO success: cache_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:18,743 INFO success: cache_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:18,771 INFO success: cache_02 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:18,813 INFO success: cache_03 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:19,818 INFO success: cache_04 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:19,818 INFO success: prio_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:19,818 INFO success: prio_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:19,818 INFO success: prio_02 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:19,818 INFO success: prio_03 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:19,818 INFO success: prio_04 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 15:20:19,819 INFO success: update_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

@r3boot have you tried updating docker/docker-compose using the official repositories btw?

We run the latest docker from the official docker repo:

[dev][root@misp01:/root]↥ 17m27s 127 # docker -v
Docker version 25.0.4, build 1a576c5

ostefano · 2024-03-11T15:34:45Z

@r3boot try

command=/var/www/MISP/app/Console/cake start_worker cache --maxExecutionTime 60

instead

ostefano · 2024-03-11T15:37:10Z

We should definitely document the issue, and maybe suggest to run the usermod command by using a custom entrypoint (which is a documented way to run additional commands) since the command should be idempotent after all.

In other words, no image modification but documented and supported workaround.

r3boot · 2024-03-11T16:24:28Z

--maxExecutionTime 60

Sharp! Tried it, doesnt work. I will leave the current dev instance running for 24h and look again. I'm pretty sure it will work, but the proof is in the pudding ofc :)

We should definitely document the issue, and maybe suggest to run the usermod command by using a custom entrypoint (which is a documented way to run additional commands) since the command should be idempotent after all.

In other words, no image modification but documented and supported workaround.

Ack. I will perform the required changes in our deployment code. Thanks a lot for your help!

One final question tho .. We are going to move our misp instances to k8s lateron this year (which is finally possible b/c misp-docker, yay!), and I would love to have some insight into how these containers are developed, and maybe even do some PR's of my own (if I find time). Are there any channels I can follow to get more into the loop? (IRC, mailinglist, etc)

ostefano · 2024-03-11T17:08:07Z

@r3boot feel free to join https://gitter.im/MISP/Docker

r3boot · 2024-03-13T11:57:23Z

Checked the logs after applying the workaround:

2024-03-12 15:38:35,114 INFO exited: default_00 (exit status 0; expected)
2024-03-12 15:38:36,122 INFO spawned: 'default_00' with pid 4677
2024-03-12 15:38:37,239 INFO success: default_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:38,533 INFO exited: default_04 (exit status 0; expected)
2024-03-12 15:38:38,739 INFO spawned: 'default_04' with pid 4685
2024-03-12 15:38:38,764 INFO exited: default_02 (exit status 0; expected)
2024-03-12 15:38:38,765 INFO exited: default_03 (exit status 0; expected)
2024-03-12 15:38:38,781 INFO spawned: 'default_02' with pid 4686
2024-03-12 15:38:38,809 INFO spawned: 'default_03' with pid 4687
2024-03-12 15:38:38,823 INFO exited: email_02 (exit status 0; expected)
2024-03-12 15:38:38,831 INFO exited: default_01 (exit status 0; expected)
2024-03-12 15:38:38,838 INFO exited: email_04 (exit status 0; expected)
2024-03-12 15:38:38,839 INFO exited: cache_02 (exit status 0; expected)
2024-03-12 15:38:38,839 INFO exited: cache_03 (exit status 0; expected)
2024-03-12 15:38:38,840 INFO exited: email_03 (exit status 0; expected)
2024-03-12 15:38:38,841 INFO exited: cache_04 (exit status 0; expected)
2024-03-12 15:38:38,841 INFO exited: prio_02 (exit status 0; expected)
2024-03-12 15:38:38,842 INFO exited: prio_03 (exit status 0; expected)
2024-03-12 15:38:38,842 INFO exited: prio_04 (exit status 0; expected)
2024-03-12 15:38:38,844 INFO exited: update_00 (exit status 0; expected)
2024-03-12 15:38:38,845 INFO exited: cache_00 (exit status 0; expected)
2024-03-12 15:38:38,845 INFO exited: cache_01 (exit status 0; expected)
2024-03-12 15:38:38,849 INFO spawned: 'default_01' with pid 4688
2024-03-12 15:38:38,852 INFO spawned: 'email_02' with pid 4690
2024-03-12 15:38:38,860 INFO spawned: 'email_03' with pid 4694
2024-03-12 15:38:38,865 INFO spawned: 'email_04' with pid 4698
2024-03-12 15:38:38,867 INFO spawned: 'cache_00' with pid 4699
2024-03-12 15:38:38,870 INFO spawned: 'cache_01' with pid 4700
2024-03-12 15:38:38,886 INFO spawned: 'cache_02' with pid 4706
2024-03-12 15:38:38,888 INFO spawned: 'cache_03' with pid 4719
2024-03-12 15:38:38,893 INFO spawned: 'cache_04' with pid 4729
2024-03-12 15:38:38,903 INFO spawned: 'prio_02' with pid 4733
2024-03-12 15:38:38,909 INFO spawned: 'prio_03' with pid 4740
2024-03-12 15:38:38,914 INFO spawned: 'prio_04' with pid 4745
2024-03-12 15:38:38,923 INFO spawned: 'update_00' with pid 4750
2024-03-12 15:38:38,935 INFO exited: email_00 (exit status 0; expected)
2024-03-12 15:38:38,936 INFO exited: email_01 (exit status 0; expected)
2024-03-12 15:38:38,936 INFO exited: prio_00 (exit status 0; expected)
2024-03-12 15:38:38,937 INFO exited: prio_01 (exit status 0; expected)
2024-03-12 15:38:39,218 INFO spawned: 'email_00' with pid 4813
2024-03-12 15:38:39,223 INFO spawned: 'email_01' with pid 4814
2024-03-12 15:38:39,233 INFO spawned: 'prio_00' with pid 4815
2024-03-12 15:38:39,276 INFO spawned: 'prio_01' with pid 4816
2024-03-12 15:38:39,751 INFO success: default_04 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:39,775 INFO success: default_02 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:39,812 INFO success: default_03 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:39,859 INFO success: default_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:39,860 INFO success: email_02 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:39,861 INFO success: email_03 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:39,877 INFO success: email_04 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:39,878 INFO success: cache_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:39,878 INFO success: cache_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:39,879 INFO success: cache_02 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:40,883 INFO success: email_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:40,884 INFO success: email_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:40,884 INFO success: cache_03 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:40,884 INFO success: cache_04 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:40,884 INFO success: prio_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:40,884 INFO success: prio_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:40,884 INFO success: prio_02 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:40,884 INFO success: prio_03 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:40,884 INFO success: prio_04 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-12 15:38:40,884 INFO success: update_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

Closing issue :)

@r3boot feel free to join https://gitter.im/MISP/Docker

Tnx!

ostefano · 2024-03-13T12:03:22Z

@r3boot can you document the issue and the fix inside the README.md and open a PR?

r3boot · 2024-03-13T13:21:43Z

Sure thing. See: #28

r3boot · 2024-03-14T16:45:37Z

I spoke too soon.

2024-03-13 12:05:12,520 WARN For [program:nginx], redirect_stderr=true but stderr_logfile has also been set to a filename, the filename has been ignored
2024-03-13 12:05:12,520 WARN For [program:php-fpm], redirect_stderr=true but stderr_logfile has also been set to a filename, the filename has been ignored
2024-03-13 12:05:12,520 WARN For [program:cron], redirect_stderr=true but stderr_logfile has also been set to a filename, the filename has been ignored
2024-03-13 12:05:12,520 INFO Included extra file "/etc/supervisor/conf.d/10-supervisor.conf" during parsing
2024-03-13 12:05:12,520 INFO Included extra file "/etc/supervisor/conf.d/50-workers.conf" during parsing
2024-03-13 12:05:12,520 INFO Set uid to user 0 succeeded
2024-03-13 12:05:12,527 INFO RPC interface 'supervisor' initialized
2024-03-13 12:05:12,528 INFO RPC interface 'supervisor' initialized
2024-03-13 12:05:12,528 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2024-03-13 12:05:12,529 INFO supervisord started with pid 6
2024-03-13 12:05:13,531 INFO spawned: 'cron' with pid 7
2024-03-13 12:05:13,545 INFO spawned: 'nginx' with pid 8
2024-03-13 12:05:13,547 INFO spawned: 'php-fpm' with pid 9
2024-03-13 12:05:14,530 INFO success: cron entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:14,534 INFO success: nginx entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:14,548 INFO success: php-fpm entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:46,329 INFO spawned: 'default_00' with pid 1039
2024-03-13 12:05:46,331 INFO spawned: 'default_01' with pid 1040
2024-03-13 12:05:46,333 INFO spawned: 'default_02' with pid 1041
2024-03-13 12:05:46,335 INFO spawned: 'default_03' with pid 1042
2024-03-13 12:05:46,337 INFO spawned: 'default_04' with pid 1043
2024-03-13 12:05:46,344 INFO spawned: 'email_00' with pid 1045
2024-03-13 12:05:46,352 INFO spawned: 'email_01' with pid 1048
2024-03-13 12:05:46,360 INFO spawned: 'email_02' with pid 1055
2024-03-13 12:05:46,365 INFO spawned: 'email_03' with pid 1060
2024-03-13 12:05:46,371 INFO spawned: 'email_04' with pid 1067
2024-03-13 12:05:46,389 INFO spawned: 'cache_00' with pid 1087
2024-03-13 12:05:46,397 INFO spawned: 'cache_01' with pid 1091
2024-03-13 12:05:46,410 INFO spawned: 'cache_02' with pid 1093
2024-03-13 12:05:46,421 INFO spawned: 'cache_03' with pid 1107
2024-03-13 12:05:46,433 INFO spawned: 'cache_04' with pid 1112
2024-03-13 12:05:46,464 INFO spawned: 'prio_00' with pid 1122
2024-03-13 12:05:46,479 INFO spawned: 'prio_01' with pid 1127
2024-03-13 12:05:46,510 INFO spawned: 'prio_02' with pid 1132
2024-03-13 12:05:46,523 INFO spawned: 'prio_03' with pid 1139
2024-03-13 12:05:46,537 INFO spawned: 'prio_04' with pid 1145
2024-03-13 12:05:46,591 INFO spawned: 'update_00' with pid 1158
2024-03-13 12:05:47,335 INFO success: default_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:47,335 INFO success: default_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:47,335 INFO success: default_02 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:47,335 INFO success: default_03 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:47,350 INFO success: default_04 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:47,351 INFO success: email_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:47,351 INFO success: email_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:47,363 INFO success: email_02 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:47,379 INFO success: email_03 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:47,380 INFO success: email_04 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:47,391 INFO success: cache_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:47,415 INFO success: cache_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:47,416 INFO success: cache_02 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:47,423 INFO success: cache_03 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:47,435 INFO success: cache_04 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:48,436 INFO success: prio_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:48,436 INFO success: prio_01 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:48,436 INFO success: prio_02 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:48,436 INFO success: prio_03 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:48,436 INFO success: prio_04 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-13 12:05:48,436 INFO success: update_00 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-14 12:05:54,384 INFO exited: default_00 (exit status 0; expected)
2024-03-14 12:05:55,395 INFO spawnerr: unknown error making dispatchers for 'default_00': EACCES
2024-03-14 12:05:56,397 INFO spawnerr: unknown error making dispatchers for 'default_00': EACCES
2024-03-14 12:05:58,400 INFO spawnerr: unknown error making dispatchers for 'default_00': EACCES
2024-03-14 12:05:58,690 INFO exited: email_00 (exit status 0; expected)
2024-03-14 12:05:58,691 INFO exited: prio_02 (exit status 0; expected)
2024-03-14 12:05:58,693 INFO spawnerr: unknown error making dispatchers for 'email_00': EACCES
2024-03-14 12:05:58,694 INFO spawnerr: unknown error making dispatchers for 'prio_02': EACCES
2024-03-14 12:05:59,183 INFO exited: update_00 (exit status 0; expected)
2024-03-14 12:05:59,184 INFO spawnerr: unknown error making dispatchers for 'update_00': EACCES
2024-03-14 12:05:59,399 INFO exited: prio_03 (exit status 0; expected)
2024-03-14 12:05:59,400 INFO spawnerr: unknown error making dispatchers for 'prio_03': EACCES
2024-03-14 12:05:59,700 INFO spawnerr: unknown error making dispatchers for 'email_00': EACCES
2024-03-14 12:05:59,700 INFO spawnerr: unknown error making dispatchers for 'prio_02': EACCES
2024-03-14 12:05:59,700 INFO exited: cache_00 (exit status 0; expected)
2024-03-14 12:05:59,701 INFO spawnerr: unknown error making dispatchers for 'cache_00': EACCES
2024-03-14 12:06:00,705 INFO spawnerr: unknown error making dispatchers for 'cache_00': EACCES
2024-03-14 12:06:00,705 INFO spawnerr: unknown error making dispatchers for 'prio_03': EACCES
2024-03-14 12:06:00,705 INFO spawnerr: unknown error making dispatchers for 'update_00': EACCES
2024-03-14 12:06:00,705 INFO exited: cache_04 (exit status 0; expected)
2024-03-14 12:06:01,003 INFO spawnerr: unknown error making dispatchers for 'cache_04': EACCES
2024-03-14 12:06:01,127 INFO exited: cache_02 (exit status 0; expected)
2024-03-14 12:06:01,129 INFO spawnerr: unknown error making dispatchers for 'cache_02': EACCES
2024-03-14 12:06:01,300 INFO exited: default_03 (exit status 0; expected)
2024-03-14 12:06:01,301 INFO exited: email_02 (exit status 0; expected)
2024-03-14 12:06:01,302 INFO exited: email_03 (exit status 0; expected)
2024-03-14 12:06:01,302 INFO exited: email_04 (exit status 0; expected)
2024-03-14 12:06:01,303 INFO exited: cache_03 (exit status 0; expected)
2024-03-14 12:06:01,304 INFO spawnerr: unknown error making dispatchers for 'default_03': EACCES
2024-03-14 12:06:01,305 INFO spawnerr: unknown error making dispatchers for 'email_02': EACCES
2024-03-14 12:06:01,307 INFO spawnerr: unknown error making dispatchers for 'email_03': EACCES
2024-03-14 12:06:01,311 INFO spawnerr: unknown error making dispatchers for 'email_04': EACCES
2024-03-14 12:06:01,315 INFO spawnerr: unknown error making dispatchers for 'cache_03': EACCES
2024-03-14 12:06:02,320 INFO spawnerr: unknown error making dispatchers for 'default_00': EACCES
2024-03-14 12:06:02,321 INFO gave up: default_00 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:02,322 INFO spawnerr: unknown error making dispatchers for 'default_03': EACCES
2024-03-14 12:06:02,322 INFO spawnerr: unknown error making dispatchers for 'email_00': EACCES
2024-03-14 12:06:02,323 INFO spawnerr: unknown error making dispatchers for 'email_02': EACCES
2024-03-14 12:06:02,323 INFO spawnerr: unknown error making dispatchers for 'email_03': EACCES
2024-03-14 12:06:02,323 INFO spawnerr: unknown error making dispatchers for 'email_04': EACCES
2024-03-14 12:06:02,323 INFO spawnerr: unknown error making dispatchers for 'cache_02': EACCES
2024-03-14 12:06:02,324 INFO spawnerr: unknown error making dispatchers for 'cache_03': EACCES
2024-03-14 12:06:02,324 INFO spawnerr: unknown error making dispatchers for 'cache_04': EACCES
2024-03-14 12:06:02,324 INFO spawnerr: unknown error making dispatchers for 'prio_02': EACCES
2024-03-14 12:06:02,324 INFO exited: default_01 (exit status 0; expected)
2024-03-14 12:06:02,324 INFO exited: default_02 (exit status 0; expected)
2024-03-14 12:06:02,324 INFO exited: default_04 (exit status 0; expected)
2024-03-14 12:06:02,324 INFO exited: email_01 (exit status 0; expected)
2024-03-14 12:06:02,325 INFO exited: cache_01 (exit status 0; expected)
2024-03-14 12:06:02,325 INFO exited: prio_00 (exit status 0; expected)
2024-03-14 12:06:02,325 INFO exited: prio_01 (exit status 0; expected)
2024-03-14 12:06:02,325 INFO exited: prio_04 (exit status 0; expected)
2024-03-14 12:06:03,326 INFO spawnerr: unknown error making dispatchers for 'default_01': EACCES
2024-03-14 12:06:03,327 INFO spawnerr: unknown error making dispatchers for 'default_02': EACCES
2024-03-14 12:06:03,327 INFO spawnerr: unknown error making dispatchers for 'default_04': EACCES
2024-03-14 12:06:03,327 INFO spawnerr: unknown error making dispatchers for 'email_01': EACCES
2024-03-14 12:06:03,327 INFO spawnerr: unknown error making dispatchers for 'cache_00': EACCES
2024-03-14 12:06:03,327 INFO spawnerr: unknown error making dispatchers for 'cache_01': EACCES
2024-03-14 12:06:03,327 INFO spawnerr: unknown error making dispatchers for 'prio_00': EACCES
2024-03-14 12:06:03,328 INFO spawnerr: unknown error making dispatchers for 'prio_01': EACCES
2024-03-14 12:06:03,328 INFO spawnerr: unknown error making dispatchers for 'prio_03': EACCES
2024-03-14 12:06:03,328 INFO spawnerr: unknown error making dispatchers for 'prio_04': EACCES
2024-03-14 12:06:03,328 INFO spawnerr: unknown error making dispatchers for 'update_00': EACCES
2024-03-14 12:06:04,330 INFO spawnerr: unknown error making dispatchers for 'default_01': EACCES
2024-03-14 12:06:04,330 INFO spawnerr: unknown error making dispatchers for 'default_02': EACCES
2024-03-14 12:06:04,330 INFO spawnerr: unknown error making dispatchers for 'default_03': EACCES
2024-03-14 12:06:04,330 INFO spawnerr: unknown error making dispatchers for 'default_04': EACCES
2024-03-14 12:06:04,331 INFO spawnerr: unknown error making dispatchers for 'email_01': EACCES
2024-03-14 12:06:04,331 INFO spawnerr: unknown error making dispatchers for 'email_02': EACCES
2024-03-14 12:06:04,331 INFO spawnerr: unknown error making dispatchers for 'email_03': EACCES
2024-03-14 12:06:04,331 INFO spawnerr: unknown error making dispatchers for 'email_04': EACCES
2024-03-14 12:06:04,331 INFO spawnerr: unknown error making dispatchers for 'cache_01': EACCES
2024-03-14 12:06:04,332 INFO spawnerr: unknown error making dispatchers for 'cache_02': EACCES
2024-03-14 12:06:04,332 INFO spawnerr: unknown error making dispatchers for 'cache_03': EACCES
2024-03-14 12:06:04,332 INFO spawnerr: unknown error making dispatchers for 'cache_04': EACCES
2024-03-14 12:06:04,332 INFO spawnerr: unknown error making dispatchers for 'prio_00': EACCES
2024-03-14 12:06:04,332 INFO spawnerr: unknown error making dispatchers for 'prio_01': EACCES
2024-03-14 12:06:04,332 INFO spawnerr: unknown error making dispatchers for 'prio_04': EACCES
2024-03-14 12:06:05,334 INFO spawnerr: unknown error making dispatchers for 'email_00': EACCES
2024-03-14 12:06:05,334 INFO gave up: email_00 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:05,334 INFO spawnerr: unknown error making dispatchers for 'prio_02': EACCES
2024-03-14 12:06:05,334 INFO gave up: prio_02 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:06,335 INFO spawnerr: unknown error making dispatchers for 'default_01': EACCES
2024-03-14 12:06:06,335 INFO spawnerr: unknown error making dispatchers for 'default_02': EACCES
2024-03-14 12:06:06,335 INFO spawnerr: unknown error making dispatchers for 'default_04': EACCES
2024-03-14 12:06:06,335 INFO spawnerr: unknown error making dispatchers for 'email_01': EACCES
2024-03-14 12:06:06,336 INFO spawnerr: unknown error making dispatchers for 'cache_00': EACCES
2024-03-14 12:06:06,336 INFO gave up: cache_00 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:06,336 INFO spawnerr: unknown error making dispatchers for 'cache_01': EACCES
2024-03-14 12:06:06,336 INFO spawnerr: unknown error making dispatchers for 'prio_00': EACCES
2024-03-14 12:06:06,336 INFO spawnerr: unknown error making dispatchers for 'prio_01': EACCES
2024-03-14 12:06:06,336 INFO spawnerr: unknown error making dispatchers for 'prio_03': EACCES
2024-03-14 12:06:06,336 INFO gave up: prio_03 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:06,336 INFO spawnerr: unknown error making dispatchers for 'prio_04': EACCES
2024-03-14 12:06:06,336 INFO spawnerr: unknown error making dispatchers for 'update_00': EACCES
2024-03-14 12:06:06,337 INFO gave up: update_00 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:07,338 INFO spawnerr: unknown error making dispatchers for 'default_03': EACCES
2024-03-14 12:06:07,338 INFO gave up: default_03 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:07,338 INFO spawnerr: unknown error making dispatchers for 'email_02': EACCES
2024-03-14 12:06:07,338 INFO gave up: email_02 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:07,339 INFO spawnerr: unknown error making dispatchers for 'email_03': EACCES
2024-03-14 12:06:07,339 INFO gave up: email_03 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:07,339 INFO spawnerr: unknown error making dispatchers for 'email_04': EACCES
2024-03-14 12:06:07,339 INFO gave up: email_04 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:07,339 INFO spawnerr: unknown error making dispatchers for 'cache_02': EACCES
2024-03-14 12:06:07,339 INFO gave up: cache_02 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:07,339 INFO spawnerr: unknown error making dispatchers for 'cache_03': EACCES
2024-03-14 12:06:07,339 INFO gave up: cache_03 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:07,339 INFO spawnerr: unknown error making dispatchers for 'cache_04': EACCES
2024-03-14 12:06:07,339 INFO gave up: cache_04 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:09,342 INFO spawnerr: unknown error making dispatchers for 'default_01': EACCES
2024-03-14 12:06:09,342 INFO gave up: default_01 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:09,342 INFO spawnerr: unknown error making dispatchers for 'default_02': EACCES
2024-03-14 12:06:09,342 INFO gave up: default_02 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:09,343 INFO spawnerr: unknown error making dispatchers for 'default_04': EACCES
2024-03-14 12:06:09,343 INFO gave up: default_04 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:09,343 INFO spawnerr: unknown error making dispatchers for 'email_01': EACCES
2024-03-14 12:06:09,343 INFO gave up: email_01 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:09,343 INFO spawnerr: unknown error making dispatchers for 'cache_01': EACCES
2024-03-14 12:06:09,343 INFO gave up: cache_01 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:09,343 INFO spawnerr: unknown error making dispatchers for 'prio_00': EACCES
2024-03-14 12:06:09,343 INFO gave up: prio_00 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:09,344 INFO spawnerr: unknown error making dispatchers for 'prio_01': EACCES
2024-03-14 12:06:09,344 INFO gave up: prio_01 entered FATAL state, too many start retries too quickly
2024-03-14 12:06:09,344 INFO spawnerr: unknown error making dispatchers for 'prio_04': EACCES
2024-03-14 12:06:09,344 INFO gave up: prio_04 entered FATAL state, too many start retries too quickly

I will do some more investigation and get back to this issue.

r3boot · 2024-03-14T17:19:47Z

Captured the EACCES using strace:

373645 openat(AT_FDCWD, "/var/www/MISP/app/tmp/logs/misp-workers.log", O_WRONLY|O_CREAT|O_APPEND|O_CLOEXEC, 0666) = -1 EACCES (Permission denied)

ostefano · 2024-03-14T17:22:55Z

Ha interesting, looks like the host changed the permissions to that file (it's created by MISP itself)

r3boot · 2024-03-14T18:00:10Z

Found it!

Ok, so, we create our misp machines using saltstack. I configured the /var/log/misp directory with the sticky bit (1770, root:www-data). This directory gets volume mounted into misp-core at /var/www/MISP/app/tmp/logs. Next, systemd started misp-core, and during startup, the permissions of /var/www/MISP/app/tmp/ gets changed:

Mar 14 18:37:52 misp01 docker[422297]: ... chown -R www-data:www-data /var/www/MISP/app/tmp
Mar 14 18:37:52 misp01 docker[422297]: ... chmod -R 0550 files /var/www/MISP/app/tmp
Mar 14 18:37:52 misp01 docker[422297]: ... chmod -R 0770 directories /var/www/MISP/app/tmp

Next, all filedescriptors are opened, and all is good. One hour later, saltstack runs again, and this resets the permissions on /var/log/misp, and re-applies the sticky bit. And this works, until supervisord restarts the workers, and the workers need to re-open their logfiles, and since www-data is not the owner of the log directory, it gets EACCES.

Yesterday, when I had this working, I had disabled the salt agent, and so the permissions did not get reset. This led to me making the assumption that all was good, so I applied the fix, re-enabled salt, and went on with my life. Until I looked at the logs today.

So yeah, I guess you could file this one under a PEBKAC.... 🥇 I do have an opinion about misp setting the permissions, but thats outside of the scope of this issue. I will rework the documentation PR with this in mind. Tnx for the patience :)

r3boot closed this as completed Mar 13, 2024

r3boot reopened this Mar 13, 2024

ostefano linked a pull request Mar 24, 2024 that will close this issue

Document permission issues leading to errors when restarting workers #28

Open

ostefano added the documentation Improvements or additions to documentation label Mar 24, 2024

ostefano assigned r3boot Mar 24, 2024

ostefano changed the title ~~Workers not being restarted once max execution time is reached~~ Permission issues when using SaltStack lead to workers failing to restart May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Permission issues when using SaltStack lead to workers failing to restart #25

Permission issues when using SaltStack lead to workers failing to restart #25

r3boot commented Mar 11, 2024

ostefano commented Mar 11, 2024

r3boot commented Mar 11, 2024 •

edited

ostefano commented Mar 11, 2024 •

edited

r3boot commented Mar 11, 2024 •

edited

ostefano commented Mar 11, 2024

ostefano commented Mar 11, 2024

r3boot commented Mar 11, 2024

ostefano commented Mar 11, 2024 •

edited

ostefano commented Mar 11, 2024

r3boot commented Mar 11, 2024

ostefano commented Mar 11, 2024

r3boot commented Mar 13, 2024

ostefano commented Mar 13, 2024

r3boot commented Mar 13, 2024

r3boot commented Mar 14, 2024

r3boot commented Mar 14, 2024

ostefano commented Mar 14, 2024

r3boot commented Mar 14, 2024 •

edited

Permission issues when using SaltStack lead to workers failing to restart #25

Permission issues when using SaltStack lead to workers failing to restart #25

Comments

r3boot commented Mar 11, 2024

Observed behaviour

Expected behaviour:

Configuration

ostefano commented Mar 11, 2024

r3boot commented Mar 11, 2024 • edited

ostefano commented Mar 11, 2024 • edited

r3boot commented Mar 11, 2024 • edited

ostefano commented Mar 11, 2024

ostefano commented Mar 11, 2024

r3boot commented Mar 11, 2024

ostefano commented Mar 11, 2024 • edited

ostefano commented Mar 11, 2024

r3boot commented Mar 11, 2024

ostefano commented Mar 11, 2024

r3boot commented Mar 13, 2024

ostefano commented Mar 13, 2024

r3boot commented Mar 13, 2024

r3boot commented Mar 14, 2024

r3boot commented Mar 14, 2024

ostefano commented Mar 14, 2024

r3boot commented Mar 14, 2024 • edited

r3boot commented Mar 11, 2024 •

edited

ostefano commented Mar 11, 2024 •

edited

r3boot commented Mar 11, 2024 •

edited

ostefano commented Mar 11, 2024 •

edited

r3boot commented Mar 14, 2024 •

edited