Lock monitoring overhead/accuracy #805

olivergillespie · 2023-08-22T11:40:36Z

I noticed in my application that -e lock causes noticeable overhead. I believe that's in part because the default threshold for recording the 'contention' is 0.

It also attributed a large amount of time in an area I believe is mostly uncontended, and comparing with e.g. -e wall, the picture is totally different.

I have a guess as to how the overhead leads to inaccurate results - the time taken to record the sample by the lock holder is not included in the acquire time reported by that thread, which is good, but thread 2 sees a longer acquire time because thread 1 is recording the sample while still holding the lock. So lock tracing actually increases contention, which is bad for performance but also messes with the result accuracy because it's not accounted for.

I suggest at least increasing the default threshold, in my test 10us is enough to remove the undue overhead while still catching the most interesting cases.

As far as I can tell, the overhead is from:

a) getting the thread id via gettid syscall
b) getting the stack trace via jvmti

I'm wondering if there's any way to move these outside the lock. I suppose you could do them pre-emptively in ContendedMonitorEnter before the lock is acquired, but that means you don't know if the threshold has been crossed yet. However, when the threshold is 0 that shouldn't be a problem so it might be beneficial. I found some discussion on the old MonitorContendedExit event which I guess would be even better it if was available, I don't know if there's any way for async-profiler to create an equivalent hook something like:

monitorExitHook() {
  if (tag.acquireTime > threshold) recordSample() // expensive work done outside lock
}

As I mentioned, I saw this in a real application, but I also tried to create a simple repro:

public class Test {
    static int x = 0;
    public static void main(String[] args) throws Exception {
        for (int t = 0; t < Integer.parseInt(args[0]); t++) {
            new Thread(Test::loop).start();
        }
        Thread.sleep(30_000L);
    }

    static void loop() {
        while (true) {
            long start = System.currentTimeMillis();
            for (int i = 0; i < 10_000_000; i++) {
                if (i % 1_000 == 0) {
                    synchronized(Test.class) {
                        if (x++ % 123 == 456) System.out.println("!");
                    }
                }
            }
            long dur = System.currentTimeMillis() - start;
            System.out.println(dur + "ms");
        }
    }
}

Attaching async-profiler after startup shows the overhead. Using --lock 10000 does not show any overhead.

java17 Test.java 5
13ms
14ms
13ms
14ms
14ms
14ms
Profiling started
14ms
21ms
40ms
44ms
47ms
68ms
81ms
88ms
81ms
86ms

And you can compare a wall profile (doesn't show significant contention) to the results shown by a lock profile (suggests high contention) to show the inaccuracy I was talking about.

Thanks for reading! Let me know what you think, I'm happy to help with implementation (even just changing the default threshold) if you think any of the ideas are good.

The text was updated successfully, but these errors were encountered:

zdyj3170101136 · 2023-08-23T02:24:26Z

i think the default threshold is not easy to decide.
the best is a random lock tracer: #775

olivergillespie · 2023-08-23T15:42:53Z

Interesting point. Lock profiling could work on an interval instead of a threshold. It doesn't really need the rand() style operation, though I think it works out the same mostly.

if (totalAcquireTime + latestAcquireTime) > interval
   recordSample() // weight of the sample = interval, like cpu profiling or tlab allocs
   totalAcquireTime = (totalAcquireTime + latestAcquireTime) % interval

It would still be separately valuable to reduce overhead and improve accuracy by reducing work done under lock, if at all possible.

apangin added a commit that referenced this issue Aug 28, 2023

#805: Lock profiling optimizations

d1de831

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lock monitoring overhead/accuracy #805

Lock monitoring overhead/accuracy #805

olivergillespie commented Aug 22, 2023

zdyj3170101136 commented Aug 23, 2023

olivergillespie commented Aug 23, 2023

Lock monitoring overhead/accuracy #805

Lock monitoring overhead/accuracy #805

Comments

olivergillespie commented Aug 22, 2023

zdyj3170101136 commented Aug 23, 2023

olivergillespie commented Aug 23, 2023