Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wall clock profiler optimizations #237

Open
guser21 opened this issue Aug 26, 2019 · 4 comments
Open

Wall clock profiler optimizations #237

guser21 opened this issue Aug 26, 2019 · 4 comments

Comments

@guser21
Copy link

guser21 commented Aug 26, 2019

Hi,

I have been using your wall clock profiler recently and it has proved itself really well in my humble tests. The only problem is when I try to use it for a process which has more than 2000 threads the frequency of the sampling per thread drops dramatically and to reach reasonable results one needs to lower the sampling interval so much that the profiler becomes a huge overhead. Unfortunately we have many processes on our production which have an order of 10k threads.

To solve this problem I suggest 2 approaches.

  1. One idea would be to have a regex which would enable the users filter threads that needs to be profiled. Usually after doing a broad profiling we needed to focus on a single threadpool, which might be particularly interesting.

The way I imagine how this could be implemented is to filter out the pids of the threads, which thread names match to a specific regex and to send the signals only to them. The major problem that I see in this approach is that usually threadpools are dynamic and during the profiling there might be new threads created or deleted. So one might need to go through all those threads periodically during sampling to ensure that we have all the pids of the interesting threads.

What do you think about this approach ? How can we escape the additional overhead of periodically matching the threads to the regex (or do we really need to do that)?

  1. The second idea is a little bit more complicated. Most of the threads, most of the time do nothing, so it is kind of useless to walk through their stack every time. The idea is to count the number of context switches per thread and before sending the the new signal, check if it has changed dramatically (dramatically because I believe that the signal handling itself is a context switch) . This way we can avoid useless stack walking for idle threads and increase the sampling frequency for working threads.

There are a couple of problems with this approach.
Firstly, it is not crystal clear to me how this could be implemented. I don't really know if the approach with the context switches would work. The other part is how we would get the stacks for the idle threads efficiently (concurrency issues).
Secondly the overhead of the profiler will become dynamic as the less threads you have the more data you generate as the time spent on useless stackwalking will be spent on the stackwalking of working threads.
Third the distribution will get skewed as the working threads will get more sampled than they would have been sampled.

What are your ideas about this approach? Would this work and may be you could give clues how this could be implemented efficiently?

So does any of these ideas make any sense and would they be useful?
I would be love to hear your feedback on these ideas before I actually start to implement them.

P.S.
I am a student and still a newbie in the JVM world, so if something is complete nonsense correct me ))).

@apangin

@apangin
Copy link
Collaborator

apangin commented Aug 28, 2019

Thank you for the detailed feedback and for the constructive ideas!
I completely agree that some kind of thread filtering would be helpful - this is not the first time I hear such request. However, I was thinking about more general filters, e.g. not only by thread name, but also by method or class name.

Fortunately, we don't even need to scan threads, because thread start/stop events are already handled, and intercepting Thread.setName is not a big deal (or better, its native counterpart Thread.setNativeName, because it's easy to replace a native method entry with JNI RegisterNatives).

So, I'd definitely go for the first option, since it is 1) easier to implement; 2) understandable and predictable from user's perspective.

@guser21
Copy link
Author

guser21 commented Aug 29, 2019

Thanks for the reply!
I will start working on the feature and soon will submit a draft PR to further discuss the implementation details.

@guser21
Copy link
Author

guser21 commented Sep 2, 2019

Thanks for the advice @apangin . I have looked through it and found two potential problems.

  1. I don’t see an easy way of getting the pid of the thread for which setName was called. First the thread might not be created at the moment when setName is called, so there is no pid assigned. Secondly the call is in the context of the calling thread and I don’t know any neat way of getting the native id for the desired thread.

  2. Looking at the implementation of Thread.setName I have noticed that there are 2 main paths:
    if the thread has started before calling Thread.setName it in fact calls Thread.setNativeName
    otherwise the naming is done inside jdk after start0 call by os::set_native_thread_name.
    So intercepting setNativeName gives the name for only 1 of 2 paths.

The easy way to deal with this would be to loop through all threads and get their names, pids every once in a while. Then we could filter somehow and update the list.

The hacky way would be to let every thread to update the pid list the first time they receive a signal. Also we would need to intercept Thread.setNativeName because someone might change the thread name after it started.

Honestly, I prefer the first one as it is simpler and I don’t think there will be a huge overhead to go through all process threads once in a while. For example on Linux it is a simple read from proc file system.

Do you have any thoughts on this ?

@apangin
Copy link
Collaborator

apangin commented Sep 2, 2019

There should be no problem in intercepting Thread.setNativeName:

  • async-profiler can already extract the native thread ID from jthread object, see Profiler::updateThreadName.
  • There is no need to handle setName called before thread start, because the profiler already receives ThreadStart event and gets the initial name of the thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants