Make SIGUSR2 trigger a heap dump instead of triggering a garbage colletion (GC) #7615
kwilczynski
started this conversation in
Polls
Replies: 1 comment
-
Related to: |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
TL;DR
We would like to either completely retire the ability to trigger garbage collection on-demand using the SIGUSR2 signal or move the trigger to a different signal, such as the SIGWINCH signal (which might be an odd choice; however, there aren't many viable signals left available to chose from), if there is a desire to keep this functionality around, and then use SIGUSR2 to trigger a heap memory dump at runtime, which is helpful for troubleshooting which is far more beneficial.
Currently, to obtain a heap memory dump during runtime, which is very useful during troubleshooting, one would have to enable the Go pprof endpoint either as the standalone HTTP listener (that will listen on localhost and TCP port 6060) using the available
--profile
command-line argument or expose it through the existing CRI-O control socket (a Unix socket) using the--enable-profile-unix-socket
command-line argument, or via a relevant environment variable, where both of these mechanisms can be enabled at the same time or independently.Similarly, to collect the current stack trace (of executing goroutines; some people call it a "thread dump"), one can use either the Go pprof endpoint or send a SIGUSR1 signal to the running CRI-O process to trigger stack trace to be written as a text file to the
/tmp
directory.But, what if the Go pprof endpoint hasn't been enabled, which isn't by default, and there is a need to collect troubleshooting data as part of a debugging process? Well, simply put, there isn't a way to do it—perhaps sans collecting a complete memory dump (a core dump, if you wish) using a script such as the
gcore
, for example, to later attempt to extra heap data from the collected memory dump. This is also a complex and involved process.Thus, we would like to propose a change in the default behaviour, where the SIGUSR2 signal (currently used to trigger a garbage collection (GC) during runtime) would be repurposed to serve as a mechanism to dump heap data during runtime, similarly to what SIGUSR1 signal offers for goroutines stack trace.
This means that the GC trigger would be either retired entirely or it would be moved to a different, and currently not occupied, signal such as the SIGWINCH, such that if there is a need to trigger a garbage collection routine during runtime, then there will still be a way to do, the functionality will be preserved.
Why do we suggest to retire the GC trigger? Our testing showed that manually triggering garbage collection does not necessarily help with CRI-O memory usage, and the results are dubious at best. Often, there is minimal gain from starting a GC run on demand, especially as both Go runtime and CRI-O keep improving with time, getting better at memory utilisation. Triggering the GC manually, worth mentioning, comes at a cost. To perform garbage collection, the process execution will be temporarily paused, and, as such, there is some increase in latency should this feature be invoked often on a busy CRI-O process.
Therefore, we would like to ask CRI-O users whether this proposed course of action would be acceptable and whether changing the SIGUSR2 signal to perform heap memory dumps would be more desirable than triggering an on-demand GC run. Once a decision is made, we will follow the process to deprecate the old functionality, should there be a change to how SIGUSR2 will be used going forward.
2 votes ·
Beta Was this translation helpful? Give feedback.
All reactions