-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Heartbeat messages could warn if there is a deadlock #226
Comments
Yeah, the initial concept was not to interfere with the forked jvm -- not in any way, including not forking any daemon threads that would ping back the forked process state. The current "heartbeat" is merely a status message that is displayed on the controller side if the forked JVM doesn't provide any progress logs (new commenced tests, etc.), but these logs happen only on the boundary of tests and suites, not within tests. I do think it's important to know whether a forked process stalled and it would be helpful to know the reason why it did so, but in reality I think you'll have to {{jps}} that process to get its stack trace (possibly multiple times) anyway. The problem with calling anything from background threads is that it entails more problems -- if the process is running low on memory it'll OOM on getting those stack traces, etc. It has happened in the past, very hard to guard against in Java. |
yeah, in my head i was initially thinking that the main ant process (that logs the heartbeat) could just check the stacktraces/deadlock state exactly like jstack does, and i assumed that was via ThreadMXBean, but then i realized jstack uses entirely undocumented com.sun APIs, and that's when i stoped trying to write a patch and just filed this issue as is hoping someone else had a better idea. |
I think ultimately a "true" heart beat (including stack traces from the forked JVM) would be sufficient to diagnose most problems. These tend to cluster into three categories:
I can't recall any other situation that wouldn't fall into any of these classes, so proper heartbeat seems crucial. Since you brought that up I'll try to handle this soon(ish), it's been on the plate for too long. |
Related to issue #132 (but seems diff enough because it should probably always be on, instead of just being an option).
If a deadlock can be detected (via ThreadMXBean.html.findDeadlockedThreads()) when the HEARTBEAT messages are being reported, it would be good to included at least the bare details in the HEARTBEAT log message (ie: "...DEADLOCKED!")
Since a deadlock may be temporary (Lock.tryLock with a really long delay for example) it shouldn't automatically kill the test -- but maybe that should be an option?
I started to try and come up with a patch for this, but then realized i had no idea how to get the JMX info from the slave process so that the master thread could get the ThreadMXBean (but this seems like a general problem blocking #132 as well -- how to get the thread/stack info remotely from the slave JVM)
The text was updated successfully, but these errors were encountered: