-
Notifications
You must be signed in to change notification settings - Fork 812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wall clock profiler optimizations #237
Comments
Thank you for the detailed feedback and for the constructive ideas! Fortunately, we don't even need to scan threads, because thread start/stop events are already handled, and intercepting So, I'd definitely go for the first option, since it is 1) easier to implement; 2) understandable and predictable from user's perspective. |
Thanks for the reply! |
Thanks for the advice @apangin . I have looked through it and found two potential problems.
The easy way to deal with this would be to loop through all threads and get their names, pids every once in a while. Then we could filter somehow and update the list. The hacky way would be to let every thread to update the pid list the first time they receive a signal. Also we would need to intercept Honestly, I prefer the first one as it is simpler and I don’t think there will be a huge overhead to go through all process threads once in a while. For example on Linux it is a simple read from Do you have any thoughts on this ? |
There should be no problem in intercepting
|
Hi,
I have been using your wall clock profiler recently and it has proved itself really well in my humble tests. The only problem is when I try to use it for a process which has more than 2000 threads the frequency of the sampling per thread drops dramatically and to reach reasonable results one needs to lower the sampling interval so much that the profiler becomes a huge overhead. Unfortunately we have many processes on our production which have an order of 10k threads.
To solve this problem I suggest 2 approaches.
The way I imagine how this could be implemented is to filter out the pids of the threads, which thread names match to a specific regex and to send the signals only to them. The major problem that I see in this approach is that usually threadpools are dynamic and during the profiling there might be new threads created or deleted. So one might need to go through all those threads periodically during sampling to ensure that we have all the pids of the interesting threads.
What do you think about this approach ? How can we escape the additional overhead of periodically matching the threads to the regex (or do we really need to do that)?
There are a couple of problems with this approach.
Firstly, it is not crystal clear to me how this could be implemented. I don't really know if the approach with the context switches would work. The other part is how we would get the stacks for the idle threads efficiently (concurrency issues).
Secondly the overhead of the profiler will become dynamic as the less threads you have the more data you generate as the time spent on useless stackwalking will be spent on the stackwalking of working threads.
Third the distribution will get skewed as the working threads will get more sampled than they would have been sampled.
What are your ideas about this approach? Would this work and may be you could give clues how this could be implemented efficiently?
So does any of these ideas make any sense and would they be useful?
I would be love to hear your feedback on these ideas before I actually start to implement them.
P.S.
I am a student and still a newbie in the JVM world, so if something is complete nonsense correct me ))).
@apangin
The text was updated successfully, but these errors were encountered: