Question: Enhance ThreadLeakControl to print additional metadata #306

madrob · 2022-08-02T16:33:12Z

In Solr we have MDCAwareThreadPool that tracks a submitterStackTrace for the threads that it manages. This information would be really useful to print on leaked threads when it is available, since it would help pinpoint where this thread came from that was leaked (useful when it looks like a generic http client thread, for example).

I don't think it makes sense for the runner to know about Solr's convention, but maybe we can make some kind of extensible API cut point that would allow us to customize how we gather stack trace information?

dweiss · 2022-08-24T19:02:10Z

Sorry, I missed this somehow, @madrob . I'm not really sure how such an extension point could work from technical perspective. I guess we could add another annotation to test suite class but these only allow to point you at static classes. I also wouldn't want to complicate the code a lot more since it's already pretty hairy around this area.

A lateral note that, although unpopular these days, aspectj's dynamic instrumentation is an excellent way to intercept such things. It'll be tricky to add a join point inside Thread constructors but you could easily append code before any new Thread call. I realize that Solr code is a heavy beast, just saying it's fun and I've done such debugging in the past (i/o handles, threads) with success.

dweiss · 2022-08-25T12:51:51Z

This is currently handled via mx bean:

      b.append("\n==== jstack at approximately timeout time ====\n");
      for (ThreadInfo ti : ManagementFactory.getThreadMXBean().dumpAllThreads(true, true)) {
        Threads.append(b, ti);
      }

I'm not sure how to make this extensible.

dweiss · 2022-08-25T12:56:14Z

I think I could add a method on RandomizedContext that would allow overriding the default stack trace collector. This way you could set/ replace it more dynamically (in a beforeclass hook or even in a test itself). Can you point me at the solr code which would be used to retrieve the origin/ creation point of a thread?

madrob · 2022-08-29T13:25:53Z

We store the submitted stack trace in ExecutorUtil, and set it up when executing a new task in MDCAwareThreadPoolExecutor

An example of how this gets used to retrieve the information later is in ObjectReleaseTracker when we'll set the submitter stack trace as the cause for the current stack trace for unclosed objects.

dweiss · 2022-08-29T14:45:11Z

I'll take a look later, Mike.

dweiss · 2022-08-31T18:24:32Z

Hmm... I peeked at solr code here:
https://github.com/apache/solr/blob/main/solr/solrj/src/java/org/apache/solr/common/util/ExecutorUtil.java#L226
a new instance of an exception doesn't have any stack trace - it needs to be thrown or fillInStackTrace() needs to be called. Perhaps I missed something somewhere, I assume it works. :)

madrob · 2022-08-31T18:36:10Z

The Throwable constructors call fillInStackTrace() for us :)

dweiss · 2022-08-31T18:53:36Z

Funny. I always thought it has to be an explicit throw/call... eh. So I looked at the code and it's kind of hard to augment the stack trace information only - you could, but injecting or replacing just StackTraceElement[] will be misleading, I think. Instead, would a hook method called when a timeout (or a thread leak) occurs be sufficient? This could print whatever you want in addition to (or replacing?) the standard messages printed by the framework.

madrob · 2022-08-31T18:58:39Z

Yea, a hook method would be fine. Not sure if it makes sense to have the hook be for handling the whole leak, or just printing the status block. Actually... let me do a little bit of research, I'm realizing that if we are executing in one thread, it may be difficult to get at the ThreadLocal info from another thread that we only have a reference to. Or maybe it's easy, I don't know. But if we can't do it then there's no point in you making changes to randomized runner

dweiss · 2022-08-31T19:01:23Z

Yep, sure. Thanks.

dweiss · 2022-09-01T07:46:11Z

One more thing that sprang to my mind - there is already a mechanism to intercept test events, junit-style. This is exactly the way we print reproduce messages - by attaching a custom listener on the base class:

https://github.com/apache/lucene/blob/main/lucene/test-framework/src/java/org/apache/lucene/tests/util/RunListenerPrintReproduceInfo.java

you could do the same - intercept failures and print any threads that have not returned properly/ have their origin stacks registered. This requires some duplication of infrastructure on Solr side but it'd give you full control over what's going on and where.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Enhance ThreadLeakControl to print additional metadata #306

Question: Enhance ThreadLeakControl to print additional metadata #306

madrob commented Aug 2, 2022

dweiss commented Aug 24, 2022

dweiss commented Aug 25, 2022

dweiss commented Aug 25, 2022

madrob commented Aug 29, 2022

dweiss commented Aug 29, 2022

dweiss commented Aug 31, 2022

madrob commented Aug 31, 2022

dweiss commented Aug 31, 2022

madrob commented Aug 31, 2022

dweiss commented Aug 31, 2022

dweiss commented Sep 1, 2022

Question: Enhance ThreadLeakControl to print additional metadata #306

Question: Enhance ThreadLeakControl to print additional metadata #306

Comments

madrob commented Aug 2, 2022

dweiss commented Aug 24, 2022

dweiss commented Aug 25, 2022

dweiss commented Aug 25, 2022

madrob commented Aug 29, 2022

dweiss commented Aug 29, 2022

dweiss commented Aug 31, 2022

madrob commented Aug 31, 2022

dweiss commented Aug 31, 2022

madrob commented Aug 31, 2022

dweiss commented Aug 31, 2022

dweiss commented Sep 1, 2022