Wall clock profiler optimizations #237

guser21 · 2019-08-26T20:15:10Z

Hi,

I have been using your wall clock profiler recently and it has proved itself really well in my humble tests. The only problem is when I try to use it for a process which has more than 2000 threads the frequency of the sampling per thread drops dramatically and to reach reasonable results one needs to lower the sampling interval so much that the profiler becomes a huge overhead. Unfortunately we have many processes on our production which have an order of 10k threads.

To solve this problem I suggest 2 approaches.

One idea would be to have a regex which would enable the users filter threads that needs to be profiled. Usually after doing a broad profiling we needed to focus on a single threadpool, which might be particularly interesting.

The way I imagine how this could be implemented is to filter out the pids of the threads, which thread names match to a specific regex and to send the signals only to them. The major problem that I see in this approach is that usually threadpools are dynamic and during the profiling there might be new threads created or deleted. So one might need to go through all those threads periodically during sampling to ensure that we have all the pids of the interesting threads.

What do you think about this approach ? How can we escape the additional overhead of periodically matching the threads to the regex (or do we really need to do that)?

The second idea is a little bit more complicated. Most of the threads, most of the time do nothing, so it is kind of useless to walk through their stack every time. The idea is to count the number of context switches per thread and before sending the the new signal, check if it has changed dramatically (dramatically because I believe that the signal handling itself is a context switch) . This way we can avoid useless stack walking for idle threads and increase the sampling frequency for working threads.

There are a couple of problems with this approach.
Firstly, it is not crystal clear to me how this could be implemented. I don't really know if the approach with the context switches would work. The other part is how we would get the stacks for the idle threads efficiently (concurrency issues).
Secondly the overhead of the profiler will become dynamic as the less threads you have the more data you generate as the time spent on useless stackwalking will be spent on the stackwalking of working threads.
Third the distribution will get skewed as the working threads will get more sampled than they would have been sampled.

What are your ideas about this approach? Would this work and may be you could give clues how this could be implemented efficiently?

So does any of these ideas make any sense and would they be useful?
I would be love to hear your feedback on these ideas before I actually start to implement them.

P.S.
I am a student and still a newbie in the JVM world, so if something is complete nonsense correct me ))).

@apangin

apangin · 2019-08-28T22:47:55Z

Thank you for the detailed feedback and for the constructive ideas!
I completely agree that some kind of thread filtering would be helpful - this is not the first time I hear such request. However, I was thinking about more general filters, e.g. not only by thread name, but also by method or class name.

Fortunately, we don't even need to scan threads, because thread start/stop events are already handled, and intercepting Thread.setName is not a big deal (or better, its native counterpart Thread.setNativeName, because it's easy to replace a native method entry with JNI RegisterNatives).

So, I'd definitely go for the first option, since it is 1) easier to implement; 2) understandable and predictable from user's perspective.

guser21 · 2019-08-29T22:06:13Z

Thanks for the reply!
I will start working on the feature and soon will submit a draft PR to further discuss the implementation details.

guser21 · 2019-09-02T18:46:11Z

Thanks for the advice @apangin . I have looked through it and found two potential problems.

I don’t see an easy way of getting the pid of the thread for which setName was called. First the thread might not be created at the moment when setName is called, so there is no pid assigned. Secondly the call is in the context of the calling thread and I don’t know any neat way of getting the native id for the desired thread.
Looking at the implementation of Thread.setName I have noticed that there are 2 main paths:
if the thread has started before calling Thread.setName it in fact calls Thread.setNativeName
otherwise the naming is done inside jdk after start0 call by os::set_native_thread_name.
So intercepting setNativeName gives the name for only 1 of 2 paths.

The easy way to deal with this would be to loop through all threads and get their names, pids every once in a while. Then we could filter somehow and update the list.

The hacky way would be to let every thread to update the pid list the first time they receive a signal. Also we would need to intercept Thread.setNativeName because someone might change the thread name after it started.

Honestly, I prefer the first one as it is simpler and I don’t think there will be a huge overhead to go through all process threads once in a while. For example on Linux it is a simple read from proc file system.

Do you have any thoughts on this ?

apangin · 2019-09-02T20:35:02Z

There should be no problem in intercepting Thread.setNativeName:

async-profiler can already extract the native thread ID from jthread object, see Profiler::updateThreadName.
There is no need to handle setName called before thread start, because the profiler already receives ThreadStart event and gets the initial name of the thread.

guser21 mentioned this issue Sep 11, 2019

Filter threads #246

Closed

apangin mentioned this issue Oct 24, 2019

'wall' event and THREADS_PER_TICK #257

Closed

felixbarny mentioned this issue Jan 30, 2020

Update set of monitored threads for wall profiler #296

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wall clock profiler optimizations #237

Wall clock profiler optimizations #237

guser21 commented Aug 26, 2019

apangin commented Aug 28, 2019 •

edited

guser21 commented Aug 29, 2019

guser21 commented Sep 2, 2019

apangin commented Sep 2, 2019

Wall clock profiler optimizations #237

Wall clock profiler optimizations #237

Comments

guser21 commented Aug 26, 2019

apangin commented Aug 28, 2019 • edited

guser21 commented Aug 29, 2019

guser21 commented Sep 2, 2019

apangin commented Sep 2, 2019

apangin commented Aug 28, 2019 •

edited