Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heartbeat messages could warn if there is a deadlock #226

Open
hossman opened this issue Mar 31, 2016 · 4 comments
Open

Heartbeat messages could warn if there is a deadlock #226

hossman opened this issue Mar 31, 2016 · 4 comments

Comments

@hossman
Copy link

hossman commented Mar 31, 2016

Related to issue #132 (but seems diff enough because it should probably always be on, instead of just being an option).

If a deadlock can be detected (via ThreadMXBean.html.findDeadlockedThreads()) when the HEARTBEAT messages are being reported, it would be good to included at least the bare details in the HEARTBEAT log message (ie: "...DEADLOCKED!")

Since a deadlock may be temporary (Lock.tryLock with a really long delay for example) it shouldn't automatically kill the test -- but maybe that should be an option?

I started to try and come up with a patch for this, but then realized i had no idea how to get the JMX info from the slave process so that the master thread could get the ThreadMXBean (but this seems like a general problem blocking #132 as well -- how to get the thread/stack info remotely from the slave JVM)

@dweiss
Copy link
Contributor

dweiss commented Mar 31, 2016

Yeah, the initial concept was not to interfere with the forked jvm -- not in any way, including not forking any daemon threads that would ping back the forked process state. The current "heartbeat" is merely a status message that is displayed on the controller side if the forked JVM doesn't provide any progress logs (new commenced tests, etc.), but these logs happen only on the boundary of tests and suites, not within tests.

I do think it's important to know whether a forked process stalled and it would be helpful to know the reason why it did so, but in reality I think you'll have to {{jps}} that process to get its stack trace (possibly multiple times) anyway. The problem with calling anything from background threads is that it entails more problems -- if the process is running low on memory it'll OOM on getting those stack traces, etc. It has happened in the past, very hard to guard against in Java.

@hossman
Copy link
Author

hossman commented Apr 1, 2016

yeah, in my head i was initially thinking that the main ant process (that logs the heartbeat) could just check the stacktraces/deadlock state exactly like jstack does, and i assumed that was via ThreadMXBean, but then i realized jstack uses entirely undocumented com.sun APIs, and that's when i stoped trying to write a patch and just filed this issue as is hoping someone else had a better idea.

@dweiss
Copy link
Contributor

dweiss commented Apr 1, 2016

I think ultimately a "true" heart beat (including stack traces from the forked JVM) would be sufficient to diagnose most problems. These tend to cluster into three categories:

  1. the jvm hungs entirely (for whatever reason); there would be no heartbeat signals at all, everything is stalled, the controller process sees a live process that doesn't respond,
  2. the forked jvm clashes into a deadlock (again, whatever type of deadlock this is -- monitor based, resource based, class-loading based, whatever); this should be verifiable via stack traces returned from the process (no changes for several threads in each heartbeat).
  3. the forked jvm falls into a never-ending (or really long) loop somewhere; this again should be verifiable with stack traces if they're changing from heartbeat to heartbeat but the process doesn't proceed to completion.

I can't recall any other situation that wouldn't fall into any of these classes, so proper heartbeat seems crucial. Since you brought that up I'll try to handle this soon(ish), it's been on the plate for too long.

@dweiss dweiss changed the title Heartbeat messages could warni if there is a deadlock Heartbeat messages could warn if there is a deadlock Apr 4, 2016
@dweiss
Copy link
Contributor

dweiss commented Oct 12, 2016

Related to GH-233, GH-132, GH-193.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants