New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Permission issues when using SaltStack lead to workers failing to restart #25
Comments
Interesting issue, and I wonder whether the issue is related to MISP itself and its new way of managing background workers. Next time they get stuck can you open a shell inside the container and see if @iglocska is it expected that workers are killed after a while? See 'worker max execution time reached' in the logs. If that is expected, is it possible to reduce the maximum execution time to reproduce the issue more easily? |
Sure thing! I have a dev instance running for exactly this purpose:
No logging has been generated for this event, so I cannot see why this spawn error is happening. I also doublechecked the configuration inside of the container, and we have
EDIT: Note, if I start one of the workers by hand, it does work:
|
@r3boot did some tests by forcing the workers to auto-kill themselves after 10 seconds. Everything looks good here:
I have also tried In conclusion, I have no further suggestion but to investigate further why |
Diving into this deeper, I found the following error in
Based on this, I found the following open issue: moby/moby#31243. I tried the remediation steps as mentioned in moby/moby#31243 (comment). First, I added the
I combined this with the following command (inside of the container):
And lo&behold, restarting the workers does indeed work:
So yay! I will leave this running for another 24h to see if the workaround also works unattended. Question is now, how do you want to proceed? I can add this change to our local codebase and be done with it, or do you want to have this workaround in the misp container and should I wait for that? |
Is there anything we can do at If you want to test the workaround without waiting 24 hours you can always add the |
@r3boot have you tried updating docker/docker-compose using the official repositories btw? |
Imo, a workaround would be two-fold:
But, I can also see how this would not be accepted, since we dont run misp-docker in a supported way, I think?
That doesnt seem to work. I did the following: Modify 50-workers.conf:
Reread the configuration using Workers get restarted just fine, but dont stop after a minute:
We run the latest docker from the official docker repo:
|
@r3boot try
instead |
We should definitely document the issue, and maybe suggest to run the In other words, no image modification but documented and supported workaround. |
Sharp! Tried it, doesnt work. I will leave the current dev instance running for 24h and look again. I'm pretty sure it will work, but the proof is in the pudding ofc :)
Ack. I will perform the required changes in our deployment code. Thanks a lot for your help! One final question tho .. We are going to move our misp instances to k8s lateron this year (which is finally possible b/c misp-docker, yay!), and I would love to have some insight into how these containers are developed, and maybe even do some PR's of my own (if I find time). Are there any channels I can follow to get more into the loop? (IRC, mailinglist, etc) |
@r3boot feel free to join https://gitter.im/MISP/Docker |
Checked the logs after applying the workaround:
Closing issue :)
Tnx! |
@r3boot can you document the issue and the fix inside the README.md and open a PR? |
Sure thing. See: #28 |
I spoke too soon.
I will do some more investigation and get back to this issue. |
Captured the EACCES using strace:
|
Ha interesting, looks like the host changed the permissions to that file (it's created by MISP itself) |
Found it! Ok, so, we create our misp machines using saltstack. I configured the
Next, all filedescriptors are opened, and all is good. One hour later, saltstack runs again, and this resets the permissions on Yesterday, when I had this working, I had disabled the salt agent, and so the permissions did not get reset. This led to me making the assumption that all was good, so I applied the fix, re-enabled salt, and went on with my life. Until I looked at the logs today. So yeah, I guess you could file this one under a PEBKAC.... 🥇 I do have an opinion about misp setting the permissions, but thats outside of the scope of this issue. I will rework the documentation PR with this in mind. Tnx for the patience :) |
We currently are evaluating the misp-{docker,modules} v2.4.186 containers to replace our existing misp installation (based on the coolacid containers), and we are running into an issue with the workers.
Observed behaviour
The container boots, and fires up the workers, all is good:
24 hours later, the workers are gracefully killed:
However, the workers are not started again, and we need to restart the container to get back the workers.
Expected behaviour:
The misp-core container manages its workers without any intervention OR a procedure is provided on how to restart the workers.
Configuration
We use a modified setup based on the docker-compose setup, but by using systemd unit files. The containers are rebuilt using the following Dockerfile:
The unit file we use to start this container is as follows:
The corresponding env file contains the following:
The text was updated successfully, but these errors were encountered: