Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation: Make clear instructions for getting a core file, when container crashes #11740

Open
dreamcat4 opened this issue Mar 25, 2015 · 52 comments

Comments

@dreamcat4
Copy link

Hello,
I've been struggling with this on ubuntu 14.10, docker 1.5.0. More than should be the case.

My problems came from the following:

  • Assuming that the ulimit is set the same inside a container than on the host. (it isn't). But also, that the output of the ulimit command was a bit misleading because:
$ ulimit
unlimited

is not the same thing as

$ ulimit -c
0

In the container, base image was ubuntu-debootstrap:14.04.

  • On ubuntu the default setting - core files get piped to apport program. This is no good in containers. It must be overridden to something else if apport is not installed inside the container.

It would also be a help if linux kernel could have a different value of /proc/sys/kern/pattern inside the container than outside it. Since in the 2 environments they may need to point to different locations. I'm not sure that is something that Docker can do anything about but might be worth to think about.

  • It is not clear, as core dumps are a kernel managed thing, whether resulting core files are written by kernel to the host filesystem or the container filesystem. This should also be clearly documented. They are written inside the container.
  • Open question: How best to save the core files as when the container exits the filesystem is not remembered. I think docker could give a clear instruction here too - e.g. how or what to mount where to write the core files to.

Many thanks for any consideration. Else it will have to be answered on stack overflow.

@iaincgray
Copy link

+kind/writing
+exp/proficient

@dreamcat4
Copy link
Author

...and another super-important-detail that everybody should be made aware of regarding core dumps*

  • If you are using an ubuntu base image. And most likely any other popular one, such as busybox or debian too.
  • Then if you make your launch script's shebang line #!/bin/sh - that is the dash shell.

Well hey! Guess what? The ulimit -c <new value> command will not change the value in your container. It can read the value from the dash shell. But not write to it. That only works from being inside of bash.

Like this:

#!/bin/sh

ulimit -c unlimited

So again, no core dumps for you my friend! No core file will be generated. And when it fails to set the value, dash shell returns a 0 successful return code. A missing functionality in the dash shell. Despite being POSIX compliant and so we might discover that ulimit -c unlimited is not a part of the POSIX.2 spec.

^^ That is far too easy to get wrong and IMHO needs to be documented also.

@dreamcat4
Copy link
Author

More problems:

When I set the core_pattern on host

echo '/config/core.%h.%e.%t' > /proc/sys/kernel/core_pattern
echo '/recordings/core.%h.%e.%t' > /proc/sys/kernel/core_pattern

... no core file is created. Yet if I set the path to be /tmp then it magically works ?!!?!??

echo '/tmp/core.%h.%e.%t' > /proc/sys/kernel/core_pattern

I don't understand.

Other problem:

Setting shebang to #!/bin/bash and it still didn't set the ulimit

[EDIT] I've since been told that setting the ulimit is only applied for the context (scope) and duration of the script. So the setting is not seen from other launched processes inside the container (e.g. docker exec -it).

The core file is created. Into /tmp only. Nothing else so far has worked for me. It would be better if we could configure the ulimit -c unlimited setting at build time for the whole container. Rather than at run time and only for certain processes.

Please someone else check these things out. Confirm / deny.

@dreamcat4
Copy link
Author

It would be better if we could configure the ulimit -c unlimited setting at build time for the whole container. Rather than at run time and only for certain processes.

Just tried adding this to my dockerfile:

FROM ubuntu-debootstrap:14.04
RUN echo '* soft core 99999999' >> /etc/security/limits.conf
RUN echo '* hard core 99999999' >> /etc/security/limits.conf

However it did not appear to change the default value of ulimit. Still 0 when logging in with docker exec -it <image> bash. Even though the file /etc/security/limits.conf had the new value appended. Dammn.

Anybody more experienced (knowledgable) know if there a chance to set the ulimit -c and enable core dumps at build time? @jpetazzo @cpuguy83, others? And what's up with it only working for the /tmp directory?

@cpuguy83
Copy link
Member

@dreamcat4 ulimit settings are inherited from docker engine, currently.
In Docker 1.6 you will be able to set default ulimits for containers at the daemon level as well as at docke run.

@dreamcat4
Copy link
Author

@cpuguy83 Great. That sounds a lot better :)

FYI, I am still searching for an answer why it only writes the core files to /tmp and not other places. I suspect it may be a permissions issue of some kind. When it's the sams user (its:video) who the crashed process runs has also owns those other target folder(s) /config and /recordings. The file ownership of the generated core file itself is root:video.

More reference:

http://man7.org/linux/man-pages/man5/core.5.html

dnschneid/crouton#187

@moxiegirl
Copy link
Contributor

@dreamcat4 Thank you for this contribution. It is very helpful when users take the time to add these kind of issues. We will fix this. I'll ping you back so you can review.

@dreamcat4
Copy link
Author

FYI:

Have been working on an image with debugging start script that does most of the necessary stuff. Still figuring out the folder permissions thing. (Which I suppose is the last piece of it).

Image not ready yet. But until then here is a gist of the start script (the entrypoint when the container starts). It needs to be run as --privileged=true so it can set the core_pattern to a valid location inside the container (and temporarily override the host's apport / corekeeper core _pattern setting).

https://gist.github.com/dreamcat4/c2bea0e889de8860b035

AFAIKT you can't really get away without a special entrypoint script too, such as like the one in the gist above ^^. Because certain other things, in addition to the container's ulimit -c setting must also be set up and configured correctly to ensure that core dumps will get saved.

In other words: "ulimit -c unlimited is not enough"

... which I am guessing may be worth mentioning in the new documentation of ulimit feature (issue #11754) ?? To help people out a bit with understanding what else they must also be doing to get their core dumps.

Here is a summary of those steps:

  • Set --privileged=true or whatever --cap-add= lets us write to /proc/sys/kernel/core_pattern
  • To make sure the core files folder is /tmp or else make other folder (e.g. /crash) with correct permissions.
  • Temp. override core_pattern in start script
  • When program exits check for any new core file
  • Put back the core_pattern to how it was previously (for host machine / apport / whatever else).
  • Exit
  • Also trap catch other POSIX signals to put back the core_pattern.

Or else I think maybe my debug.sh script (for that specific tvheadend program) can be made much more generic. And converted into a wrapper script for pretty much any program. It would not be too hard. Then other users could use it (unmodified) in their own projects. Without having to mess around themselves with all those bits and pieces.

It cannot put back (restore) the core_pattern if you kill it though.

@dreamcat4
Copy link
Author

OK. Gist updated with final fixes / improvements.

https://gist.github.com/dreamcat4/c2bea0e889de8860b035

Didn't ever get to the bottom of the suspected permissions issue. It just 'went away all by itself' after moving my core dumps folder to be /crash.

Tested several times with permissions 772, 775, 755, and as either same user or same group owning the core dumps folder (when g+w). Got a core dump each time. So whatever was wrong before I'm inclined to just ignore it and move on.

@moxiegirl This gist script aught to show most persons well enough, how can take a core dump inside of their docker containers. It was written for a specific target program, but can may adapted for general case. If you wish to action further documentation about core dumps, please do so as you wish, and continue on the other ticket where thie new flag is being discussed (I didn't end up needing the new flag BTW). You may close my issue here and take anything of value over to other ticket.

Ticket: #11754
Many thanks.


BTW: Another FYI (this time, for setuid programs):

http://unix.stackexchange.com/questions/15531/how-come-no-core-dump-is-create-when-an-application-has-suid-set

Linux disables core dumps for setxid programs. To enable them, you need to do at least the following (I haven't checked that this is sufficient):
Enable setuid core dumps in general by setting the fs.suid_dumpable sysctl to 2,
e.g. with echo 2 >/proc/sys/fs/suid_dumpable.
(Note: 2, not 1; 1 means “I'm debugging the system as a whole and want to remove all security”.)

@dreamcat4
Copy link
Author

Oh wait!

There is a remaining (not documented) problem regarding core dumps and docker. Which is now identified the 'problem inexplicably went away' during testing. When I moved over to using a host mounted volume. As is shown in the YAML config file in the gist. Where the volume /crash gets mounted onto a folder of the host system… Then the core dumps always seem to work and get written to disk. No matter really what were the chmod file permissions.

HOWEVER if the same folder is a regular docker volume e.g. the declaration VOLUME /crash in Dockerfile. Then with the same shell script (that chown / chmod the /crash filer in the same exact way). Then the core dump emphatically DOES NOT WORK. No core file will be generated or found afterwards.

Since I am not at all sure about the underlying the technical difference(s) of the 2 types of docker volumes, the perhaps someone else with more relevant expertise in that area can please comment. So that we can try to get to the bottom of this problem. As was the original source of frustration as per the initial comments at top of page.

There is of course also a 3rd kind of target folder for core dumps, which is not any volume (and therefore is not persistent after container stop). I have not actually tested that situation. Due to a my desire to keep that core dumps after container exits. That situation is slightly less useful for users, unless they copy the core file or send it (over network etc) once the core dump is completed, and just before the container exits. But still may be a valid one. Since then no need to declare an extra volume, and the destination for core files can be configured dynamically, log - rotated, etc..

So yet another shout out to Ticket #11754 - it aught to be worth it, finding out about this remaining problem, if only to properly document it, and give a clear instruction to users.

@vincentwoo
Copy link
Contributor

@dreamcat4, this is fantastic debugging. Docker team, please take note! Getting core dumps in Docker is nigh impossible without being a wizard!

@moxiegirl
Copy link
Contributor

Wow, thank you @dreamcat4! We are just coming off the 1.6 release, so I should be able to pick this up in the coming weeks. (Other contributors are, of course, welcome to take it on sooner.) I'll be sure to ping you back for the PR review.

@matschaffer
Copy link

FWIW, I'm having decent luck using /proc/sys/kernel/core_pattern and --default-ulimit core=-1 to get core dumps from docker containers.

So far the caveats have been:

This is without --privileged=true since the core pattern gets set at the host level. It seems that each container shares the core pattern (shared kernel, so makes sense, covered in @dreamcat4's analysis as well). But the actual core dump is written to the container's file system. So I mount that back to the host via -v to save it after the container exits.

@dreamcat4
Copy link
Author

Great! Thanks for commenting here.

@bcantrill
Copy link

Great to see that I'm not the only one who cares about postmortem analysis of failed Docker containers! For whatever it's worth, I'm going to mention this issue during my talk this afternoon at DockerCon 2015; hopefully that will help get some more attention on this issue, even if it only means clearer documentation!

@dreamcat4
Copy link
Author

dreamcat4 commented Jun 23, 2015

@bcantrill ok Brian. A working example where you can see it in action / take thing from is my tvh.debug docker image:

https://registry.hub.docker.com/u/dreamcat4/tvh.debug/dockerfile/

Script: debug.sh

https://github.com/dreamcat4/docker-images/blob/master/tvh/debug/stable/debug.sh

Usage:

https://github.com/dreamcat4/docker-images/blob/master/tvh/README.md#debugging-tvheadend

Caveat:

/crash must be bind-mounted as a host volume. It doesn't work otherwise.

  • Must be run in priveleged mode.

@matschaffer
Copy link

Just thought I'd share one more discovery on this front.

In order to test process crashes I tried using kill -ABRT but apparently you can't kill -ABRT 1.

Killing the main PID Of the container with SIGABRT seems to be ignored both from the host or inside a docker exec call.

My work around is to run the process under sh -c. Then ABRT on the sub-process works as expected.

@tjmehta
Copy link
Contributor

tjmehta commented Sep 1, 2015

Great writeup @dreamcat4. Hitting the same problems today, glad to see it is well documented.

@tjmehta
Copy link
Contributor

tjmehta commented Sep 2, 2015

More info on /proc/sys/kernel/core_pattern

For my system:

$ uname -a
Linux *** 3.11.0-12-generic #19-Ubuntu SMP Wed Oct 9 16:20:46 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

My core_pattern file was a python script:

$ cat /proc/sys/kernel/core_pattern
|/usr/share/apport/apport %p %s %c

Here is the apport python script source:

/usr/share/apport/apport

It reports it's logs to /var/log/apport.log.
I noticed the pid which apport received was the pid of the process w/in the container,
so it would break the script. Here are the error logs:

/var/log/apport.log

This explains why no core file was created.

I also noticed what dreamcat4 states above: if /proc/sys/kernel/core_pattern contains a path which does not exist in the container, the core file is not created.

TLDR;

  1. Prepend your container command w/ ulimit -c unlimited
  2. set /proc/sys/kernel/core_pattern to a path that exists in the container
  3. (optional) mount a host directory as a volume to the core_pattern path ( for ease of access)
    This successfully allowed my node.js application to coredump on exit.

@vitalyisaev2
Copy link

@tjmehta could you please clarify what do you mean by "Prepend your container command w/ ulimit -c unlimited"? Do you suppose creating special entrypoint in the container image?

@tjmehta
Copy link
Contributor

tjmehta commented Oct 13, 2015

I meant just using && : ulimit -c unlimited && <CMD>. But I guess entrypoint could work too

@sdivekar
Copy link

What is the end result of this discussion? Has the questions asked by @dreamcat4 answered yet? If yes, can someone please point me to that?

Thanks!

@sdivekar
Copy link

I would like to point out that when I run command: docker run -i -t --privileged=true test:base /bin/bash
I get the container in interactive mode. Here when I set ulimit and core_pattern, and then run the crashing process, it does generate the dump at the expect location set in core_pattern. However, when the container is run in usual mode (non-interactive): docker run --privileged=true test:base
and an ENTRYPOINT ["set_core_data.sh"] shell script (in Dockerfile) sets a ulimit and core_pattern, the core_pattern cannot be set. It complains of a read-only file system.

@matschaffer
Copy link

@sdivekar I'd recommending setting the core_pattern at the docker host level. The pattern is a kernel-level parameter so I suspect if multiple containers try to set it to different things you'll end up with a last-write-wins case where some containers won't be able to dump a core file. Not sure if you saw my comment here but setting the pattern at the host level also avoids the need for running in privileged mode.

@sdivekar
Copy link

I was able to get the core on mounting volume /core and then copying the core to another folder which was mapped to outside windows folder before the container exits.

@zhangjianfnst
Copy link
Contributor

Hi, I opened an issue #19289 about core dump in docker. Do you have any demands or suggestions about this?

@sdivekar
Copy link

@dreamcat4 @matschaffer
So now after getting a core dump in my simple test app which was basically a null pointer exception causing segmentation fault, I inserted the same code in a real world app to see if I get a core. But no luck there.

I have tried setting /proc/SYS/DS/suid_dumpable is set to 1 as well
Any clues? If I run the real app modified with buggy code by itself in vagrant ( outside container) I only see a message "Killed" as opposed to " Segmentation fault (core dumped)"

@sdivekar
Copy link

Meant /proc/sys/fs/suid_dumpable and not /proc/SYS/DS/suid_dumpable i

@sdivekar
Copy link

It appears that the application was using db, and the db client (oracle client)'s signal handling was preventing the generation of core dumps. Disabling that by adding DIAG_SIGHANDLER_ENABLED=FALSE to sqlnet.ora helped in generating the core.

@blaggacao
Copy link

I would contribute my findings, which certainly overlap with the above said, but might contain complementary bits:

  • /proc/sys/core_pattern is clearly persisted on the host. Taking note of it's content before starting any endeavour is therefore highly encouraged.
  • dockers --privileged is necessary for a gdb session to catch the stack, without privileges, gdb just complains about No stack. Google still is hardly knowledgeable about this phenomenon...
  • setting ulimit on docker run works perfectly, for future googlers (syntax hard to find), an docker-compose example:
ulimits:
      core: -1

@blaggacao
Copy link

@moxiegirl the above unassignment is definitely a bug, how would I have ever rights to do that?

@dreamcat4
Copy link
Author

And I suppose if not wanting to use the full-blown --privileged=true, then the following run flags may help:

      security-opt:
        - apparmor:unconfined
        - seccomp:unconfined

plus certain CAP_ADD_xyz, (which I don't know what they are).

@justincormack
Copy link
Contributor

@dreamcat4 you probably need --cap-add CAP_PTRACE for gdb I think. In 1.12 that will also relax the seccomp filter so you should not need seccomp:unconfined.

@egnoriega
Copy link
Contributor

I'd suggest that all of this works around the problem. Reducing the confinement of the application, sometimes significantly in order to manage the core isn't the solution that I see people wanting. Since Docker is orchestrating the container, the preference would be for Docker to configure the location and methods for the kernel to deliver the core file. Perhaps this could be inside of the container, or in a related location (outside of the container root, but still managed space such as a volume), or in the system wide location.

@tedyu
Copy link

tedyu commented Aug 12, 2017

Here is my docker VM

Linux securecluster 4.9.8-moby #1 SMP Wed Feb 8 09:56:43 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

I got:

# echo '/tmp/core.%h.%e.%t' > /proc/sys/kernel/core_pattern
bash: /proc/sys/kernel/core_pattern: Read-only file system

Looks like --privileged=true allows me to bypass the error.

@hoffa
Copy link

hoffa commented Oct 26, 2017

Just dropping my two cents. Running a swarm on Ubuntu 14.04 (default container permissions), with each container using supervisord (not ideal, but cleanest for our use case). One of my containers kept failing without dumping cores.

Had to adjust the following on the host to get them:

  1. Set core limits to unlimited in /etc/security/limits.conf
  2. Add "default-ulimits": {"core": {"Name": "core", "Hard": -1, "Soft": -1}} to /etc/docker/daemon.json
    (-1 translates to unlimited; any other values gave weird results when checking with ulimit -c)
  3. Do echo '/tmp/core.%e.%p.%t' > /proc/sys/kernel/core_pattern
    (at first I had it set to /tmp/cores/core.%e.%p.%t', which silently didn't work since the cores directory didn't exist)

After that I was able to debug the issue from within the container using gdb.

gsauthof added a commit to gsauthof/utility that referenced this issue Jan 7, 2018
Docker is a miserable environment for generating core files (e.g. with
`gcore`), for accessing the memory of another process (`/proc/$pid/mem`)
or for other stuff involving ptrace.

Thus, the pargs cases that depend on these features are skipped, when
running inside docker

Some Docker versions might work when some docker privileges are
elevated, but Travis' Docker doesn't seem to offer much in that regard.

See also:

- moby/moby#11740
- moby/moby#7276
- travis-ci/travis-ci#5558
@sidazhou
Copy link

--privileged=true can't be set on docker build, it can be only set on docker run correct?
Hence I can't add RUN echo '/tmp/core.%e.%p.%t' > /proc/sys/kernel/core_pattern to my Dockerfile, as it complains of a read-only file system, correct?

@jpetazzo
Copy link
Contributor

@sidazhou even if you could run a build in privileged mode, RUN echo '/tmp/core.%e.%p.%t' > /proc/sys/kernel/core_pattern wouldn't work, because sysctls are not captured and recorded in the container image. It's like doing RUN export MYVARIABLE=value. Furthermore, the core_pattern is global to the system; so if you change it in one container, it will change on the whole system (and vice versa: if you change it on the host, it will take effect for all containers). I hope this helps.

@nurupo
Copy link

nurupo commented Nov 9, 2018

By using --privileged=true and modifying /proc/sys/kernel/core_pattern inside the container, you are actually modifying the host system's /proc/sys/kernel/core_pattern, so you might as well just modify it on the host to begin with.

@jalaziz
Copy link

jalaziz commented Apr 23, 2020

  • It is not clear, as core dumps are a kernel managed thing, whether resulting core files are written by kernel to the host filesystem or the container filesystem. This should also be clearly documented. They are written inside the container.

I haven't seen this mentioned in this thread, but the truth is a bit more complicated.

If /proc/sys/kernel/core_pattern refers to a fixed path, it is indeed resolved to a path in the container's namespace.

However, if /proc/sys/kernel/core_pattern is set to pipe into a script/binary, it is resolved to path on the host's filesystem. This also means that the core dump will likely end up being stored on the host itself.

The best documentation I've found for this oddity so far is this email thread: https://lkml.org/lkml/2015/10/24/134

@praveenmak
Copy link

praveenmak commented Oct 22, 2021

Hello Guys.

I am kind of stuck on this for some time now. I have followed all the suggestions mentioned over here.

Here is my situation.

  • We have many containers running C++ app.
  • Each container has it's own PV bind mounted to Cloud storage.
  • I have a script for each container (the script knows the PV path), I have piped that into core_pattern, !/home/app/handle_core.sh , it writes core to it's PV path.
  • Host node has no clue about container's PV path. So Kernel cannot copy the core.

Any ideas how to solve this problem?

@jpetazzo
Copy link
Contributor

Hi,

The handle_core.sh script can find the path of the container by using the %P variable (PID of the crashed process, seen by the host) + /proc/<PID>/ns. This can be simplified by using nsenter to then enter the mnt namespace.

For instance, the following will create the core file in /core in the container, and it will create a little log file (for troubleshooting issues) in /tmp/core.<PID> on the host as well:

nsenter -t $1 -m tee /core >/dev/null 2>>/tmp/core.$$

However, this only works if the namespaces still exists at dump time. If the crashed process causes the whole container to exist, then the namespaces won't exist anymore, unfortunately. From what I understand, this is because Linux correctly waits for the core handler to run, but if the whole namespaces gets destroyed for another reason (e.g. PID 1 in the container exits), then you're out of luck.

I hope this helps!

@praveenmak
Copy link

praveenmak commented Oct 25, 2021

@jpetazzo Thanks for your excellent tip.

Just one quick question, what's %P vs %p , the documentation on man core not really helping.

       %p  PID of dumped process, as seen in the PID namespace in
           which the process resides.
       %P  PID of dumped process, as seen in the initial PID
           namespace (since Linux 3.12).

@jpetazzo
Copy link
Contributor

@praveenmak When running processes in containers (or, generally speaking, with PID namespaces), processes will have one PID per namespace that they belong to; and namespaces are nested. If that feels confusing, you can try the following commands, on a Linux machine:

CONTAINERID=$(docker run -d nginx)
docker exec $CONTAINERID ps faux
ps faux | grep nginx

You will see that these NGINX processes have a PID inside the container, and a PID outside the container. So that corresponds to %p and %P respectively. I hope that helps!

@praveenmak
Copy link

Core file is generated , but with "0" bytes. Any idea what could be wrong on my side?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests