Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Enhance ThreadLeakControl to print additional metadata #306

Open
madrob opened this issue Aug 2, 2022 · 11 comments
Open

Question: Enhance ThreadLeakControl to print additional metadata #306

madrob opened this issue Aug 2, 2022 · 11 comments

Comments

@madrob
Copy link
Contributor

madrob commented Aug 2, 2022

In Solr we have MDCAwareThreadPool that tracks a submitterStackTrace for the threads that it manages. This information would be really useful to print on leaked threads when it is available, since it would help pinpoint where this thread came from that was leaked (useful when it looks like a generic http client thread, for example).

I don't think it makes sense for the runner to know about Solr's convention, but maybe we can make some kind of extensible API cut point that would allow us to customize how we gather stack trace information?

@dweiss
Copy link
Contributor

dweiss commented Aug 24, 2022

Sorry, I missed this somehow, @madrob . I'm not really sure how such an extension point could work from technical perspective. I guess we could add another annotation to test suite class but these only allow to point you at static classes. I also wouldn't want to complicate the code a lot more since it's already pretty hairy around this area.

A lateral note that, although unpopular these days, aspectj's dynamic instrumentation is an excellent way to intercept such things. It'll be tricky to add a join point inside Thread constructors but you could easily append code before any new Thread call. I realize that Solr code is a heavy beast, just saying it's fun and I've done such debugging in the past (i/o handles, threads) with success.

@dweiss
Copy link
Contributor

dweiss commented Aug 25, 2022

This is currently handled via mx bean:

      b.append("\n==== jstack at approximately timeout time ====\n");
      for (ThreadInfo ti : ManagementFactory.getThreadMXBean().dumpAllThreads(true, true)) {
        Threads.append(b, ti);
      }

I'm not sure how to make this extensible.

@dweiss
Copy link
Contributor

dweiss commented Aug 25, 2022

I think I could add a method on RandomizedContext that would allow overriding the default stack trace collector. This way you could set/ replace it more dynamically (in a beforeclass hook or even in a test itself). Can you point me at the solr code which would be used to retrieve the origin/ creation point of a thread?

@madrob
Copy link
Contributor Author

madrob commented Aug 29, 2022

We store the submitted stack trace in ExecutorUtil, and set it up when executing a new task in MDCAwareThreadPoolExecutor

An example of how this gets used to retrieve the information later is in ObjectReleaseTracker when we'll set the submitter stack trace as the cause for the current stack trace for unclosed objects.

@dweiss
Copy link
Contributor

dweiss commented Aug 29, 2022

I'll take a look later, Mike.

@dweiss
Copy link
Contributor

dweiss commented Aug 31, 2022

Hmm... I peeked at solr code here:
https://github.com/apache/solr/blob/main/solr/solrj/src/java/org/apache/solr/common/util/ExecutorUtil.java#L226
a new instance of an exception doesn't have any stack trace - it needs to be thrown or fillInStackTrace() needs to be called. Perhaps I missed something somewhere, I assume it works. :)

@madrob
Copy link
Contributor Author

madrob commented Aug 31, 2022

The Throwable constructors call fillInStackTrace() for us :)

@dweiss
Copy link
Contributor

dweiss commented Aug 31, 2022

Funny. I always thought it has to be an explicit throw/call... eh. So I looked at the code and it's kind of hard to augment the stack trace information only - you could, but injecting or replacing just StackTraceElement[] will be misleading, I think. Instead, would a hook method called when a timeout (or a thread leak) occurs be sufficient? This could print whatever you want in addition to (or replacing?) the standard messages printed by the framework.

@madrob
Copy link
Contributor Author

madrob commented Aug 31, 2022

Yea, a hook method would be fine. Not sure if it makes sense to have the hook be for handling the whole leak, or just printing the status block. Actually... let me do a little bit of research, I'm realizing that if we are executing in one thread, it may be difficult to get at the ThreadLocal info from another thread that we only have a reference to. Or maybe it's easy, I don't know. But if we can't do it then there's no point in you making changes to randomized runner

@dweiss
Copy link
Contributor

dweiss commented Aug 31, 2022

Yep, sure. Thanks.

@dweiss
Copy link
Contributor

dweiss commented Sep 1, 2022

One more thing that sprang to my mind - there is already a mechanism to intercept test events, junit-style. This is exactly the way we print reproduce messages - by attaching a custom listener on the base class:

https://github.com/apache/lucene/blob/main/lucene/test-framework/src/java/org/apache/lucene/tests/util/RunListenerPrintReproduceInfo.java

you could do the same - intercept failures and print any threads that have not returned properly/ have their origin stacks registered. This requires some duplication of infrastructure on Solr side but it'd give you full control over what's going on and where.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants