Add a `ms_threshold` flag to profiler: filter blocks "slower then" a ms value, fix recursive profiling #2281

goodboy · 2022-04-29T13:48:15Z

I found this super useful for narrowing down latency issues (from slowest to fastest) much easier then having a whole sale or "tag based" selection of sections enabled.

If noone is interested in this I'll stick with our ongoing approach of pulling into our dependent project and y'all can come ask for it later if wanted 🖖🏼

Upate: new changes to aid with recursive profiling,

Taken from latest commit msg:
Further this commit makes some larger alterations:

drops the class variable ._msgs since it was causing issues on
nested profiler use. My presumption is that this code was written with
the expectation that stacks could unwind (per thread) out of order?
I don't see why else you'd have .flush() only print on a ._depth < 1 condition..
drops .flush(), since .finish() is the only place this method is
called (even when scanning the rest of the code base) thus dropping
a call frame.
formatting adjustments around this class as per flake8 linter.

Details, in order from bullets above:

when a top level (aka ._depth == 0) profiler was deleted by the GC,
all messages from all underling (aka recursive) profilers were also
wiped, meaning you'd get a weird case where only the "Exit" msg was
showing. Now we instead expect each instance to clear it's own msg
set. There should also be no further reason for the ._depth < 1
check since each instance unwinds and emits its own messages,
presumably as the call stack unwinds via the GC.
I see no reason to have the .flush() when it was a trivial method
that can be inlined to .finish() and I doubt there's much of a use
case for "incrementally flushing" unless the user is using one "god
profiler" throughout their entire stack..

Some notes (old)

~~found the .__del__() finishing to be highly unreliable~~ now solved by making ._msgs and instance var as per above (particularly in async code driving sections with profiling)
"nested profiling" seems to be something really handy but i'm still not entirely sure how it works; do you need to pass in the caller's instance to make it work or should it work with new instances always? seems to work great now, even with using the new ms_threshold param after the latest patch.

TODO:

should we add a context manager interface to this api?

j9ac9k · 2022-05-01T13:24:57Z

pyqtgraph/debug.py

@@ -535,7 +535,7 @@ def mark(self, msg=None):
            pass
    _disabledProfiler = DisabledProfiler()

-    def __new__(cls, msg=None, disabled='env', delayed=True):
+    def __new__(cls, msg=None, disabled='env', delayed=True, gt: float = 0):


probably should make the default value be 0. or 0.0 if it's going to be type-annotated to a float.

>>> a = 0 >>> isinstance(a, float) False

good idea, are we cool with introducing type annots now?

j9ac9k · 2022-05-01T13:29:52Z

Hi @goodboy Thanks for the PR. The feature looks good to me, I agree with its usefulness.

I would suggest renaming the keyword argument from gt to threshold or something along those lines.

goodboy · 2022-05-13T00:30:12Z

@j9ac9k sounds good!

I have a couple more commits to address issues with nested profilers that i'm going to append here. I will also make the suggested changes 🏄🏼

This is a more explicit input argument name which we assign to an internal `._mt` instance var. This patch also includes some improvements for recursive profiling, see below. Further this commit makes some larger alterations: - drops the class variable `._msgs` since it was causing issues on nested profiler use. My presumption is that this code was written with the expectation that stacks could unwind (per thread) out of order? I don't see why else you'd have `.flush()` only print on a `._depth < 1` condition.. - drops `.flush()`, since `.finish()` is the only place this method is called (even when scanning the rest of the code base) thus dropping a call frame. - formatting adjustments around this `class` as per `flake8` linter. Details, in order from bullets above: - when a top level (aka `._depth == 0`) profiler was deleted by the GC, all messages from all underling (aka recursive) profilers were also wiped, meaning you'd get a weird case where only the "Exit" msg was showing. Now we instead expect each instance to clear it's own msg set. There should also be no further reason for the `._depth < 1` check since each instance unwinds and emits it's own messages, presumably as the call stack unwinds via the GC. - I see no reason to have the `.flush()` when it was a trivial method that can be inlined to `.finish()` and I doubt there's much of a use case for "incrementally flushing" unless the user is using one "god profiler" throughout their entire stack..

goodboy · 2022-05-13T19:51:55Z

@j9ac9k changed the name and added more changes that you'll likely want to audit 😂

ntjess · 2022-06-02T21:38:11Z

pyqtgraph/debug.py

+        msg=None,
+        disabled='env',
+        delayed=True,
+        ms_threshold: float = 0.0,


pijyoi mentioned some methods/args are snake-cased, but only to indicate they are not part of the explicit API. Not to bikeshed too much, but would it be better to call this msThreshold (or thresholdMs if unit suffixes typically go at the end of a number)?

I honestly am indifferent but personally don't love the camel case stuff.
It's up to y'all.

ntjess · 2022-06-03T00:09:27Z

@goodboy Since _msgs is now an object variable, your code prints nested functions in reverse order:

def test_profiler():
    """
    Test the profiler by creating a profiler and calling it.
    """
    import time
    logger.setLevel(logging.DEBUG)
    profiler = Profiler("test", disabled=False)
    def nested_func():
        profiler = Profiler("nested", disabled=False)
        time.sleep(0.1)
        profiler("after nested sleep")
    
    profiler()
    nested_func()
    profiler()
    profiler()

Produces:

  > Entering nested
    after nested sleep: 100.2726 ms
  < Exiting nested, total time: 100.3018 ms
> Entering test
  0: 0.0114 ms
  1: 102.0797 ms
  2: 0.0832 ms
< Exiting test, total time: 102.1887 ms

I'm also a little confused at the issue you reported with recursive profiling, can you give me a small code example to understand? I tried the current Profiler implementation with the following and didn't face any issues:

def fib_bad_recursive(n):
    """
    A bad recursive implementation of fibonacci.
    """
    if n < 2:
        return n
    profiler = Profiler("fib_bad_recursive", disabled=False)
    return fib_bad_recursive(n-1) + fib_bad_recursive(n-2)

def fib_memo(n, memo={}):
    """
    A memoized implementation of fibonacci.
    """
    if n < 2:
        return n
    profiler = Profiler("fib_memo", disabled=False)
    if n not in memo:
        memo[n] = fib_memo(n-1) + fib_memo(n-2)
    return memo[n]

if __name__ == '__main__':
    from pyqtgraph.debug import Profiler
    n = 10
    fib_bad_recursive(n)
    print('\n=====================\n')
    fib_memo(n)

Produced

> Entering fib_bad_recursive
  > Entering fib_bad_recursive
    > Entering fib_bad_recursive
      > Entering fib_bad_recursive
        > Entering fib_bad_recursive
        < Exiting fib_bad_recursive, total time: 0.0061 ms
      < Exiting fib_bad_recursive, total time: 0.0189 ms
      > Entering fib_bad_recursive
      < Exiting fib_bad_recursive, total time: 0.0038 ms
    < Exiting fib_bad_recursive, total time: 0.0421 ms
    > Entering fib_bad_recursive
      > Entering fib_bad_recursive
      < Exiting fib_bad_recursive, total time: 0.0032 ms
    < Exiting fib_bad_recursive, total time: 0.0140 ms
  < Exiting fib_bad_recursive, total time: 0.0766 ms
  > Entering fib_bad_recursive
    > Entering fib_bad_recursive
      > Entering fib_bad_recursive
      < Exiting fib_bad_recursive, total time: 0.0040 ms
    < Exiting fib_bad_recursive, total time: 0.0142 ms
    > Entering fib_bad_recursive
    < Exiting fib_bad_recursive, total time: 0.0033 ms
  < Exiting fib_bad_recursive, total time: 0.0345 ms
< Exiting fib_bad_recursive, total time: 0.1375 ms

=====================

> Entering fib_memo
  > Entering fib_memo
    > Entering fib_memo
      > Entering fib_memo
        > Entering fib_memo
        < Exiting fib_memo, total time: 0.0124 ms
      < Exiting fib_memo, total time: 0.0360 ms
      > Entering fib_memo
      < Exiting fib_memo, total time: 0.0060 ms
    < Exiting fib_memo, total time: 0.0772 ms
    > Entering fib_memo
    < Exiting fib_memo, total time: 0.0059 ms
  < Exiting fib_memo, total time: 0.1228 ms
  > Entering fib_memo
  < Exiting fib_memo, total time: 0.0059 ms
< Exiting fib_memo, total time: 0.1817 ms

goodboy · 2022-06-06T02:31:10Z

I'm also a little confused at the issue you reported with recursive profiling, can you give me a small code example to understand? I tried the current Profiler implementation with the following and didn't face any issues:

Yeah it probably is confusing since in the client system we're using to call this code is all async driven and it seems the whole unwinding via garbage collection isn't playing great.

I couldn't even originally see half of the profile stack levels until i made this change. Not entirely sure why.

I can probably whip up something but honestly for such a small change as this we'll probably just end up copying this code out since it's getting resistance for merge yet again 😂

It looks like you already have something more extensive in #2322 anyway 😎

ntjess · 2022-06-06T03:23:01Z

I wouldn't be opposed to the simpler solution you presented here (I mentioned in my PR that the logging implementation is quite slow if milliseconds matter)

The only thing giving me pause on this PR is the reverse order printout for the MWE I made. The hesitation is certainly not against this PR, I think it's a good idea 👍

we're using to call this code is all async driven

Do you mean async def functions or something else? It would be nice to replicate if indeed it is a bug/limitation in the current implementation, I'd be happy to help

goodboy · 2022-06-07T19:54:59Z

See my comment in #2322 but I think we might just not worry too much about keeping this and go with a more "serious" solution in the longer run for our code base.

https://github.com/plasma-umass/scalene

I really just was using this profiler for graphics stuff, but honestly it would make more sense to be profiling things properly (even Qt C++ code) instead of this counter based stuff.

goodboy · 2022-06-07T19:55:52Z

Do you mean async def functions or something else? It would be nice to replicate if indeed it is a bug/limitation in the current implementation, I'd be happy to help

Yes, we have a single-thread graphics loop entirely driven by async data flows from multiple processes.

goodboy · 2022-06-07T20:01:08Z

The only thing giving me pause on this PR is the reverse order printout for the MWE I made. The hesitation is certainly not against this PR, I think it's a good idea +1

Sorry missed this. Yeah I mean if it's something obvious can you just revert what you think is causing it, PR it to this branch and then I'll give it a shot.

Sorry i'm not in graphics land at the moment but will definitely come back this!

Also thanks for all the feedback 😸

goodboy · 2022-06-25T15:50:41Z

@ntjess just went back to this again and I think you're right. It's the change of ._msgs to an instance var that broke it.

Not sure why I couldn't get it working before (maybe other issues in the threshold logic before it was working properly?) but seems to be doing about what I'd expect now.

I'll push up this last little change and you can tell me if it matches your expectations 😎

goodboy · 2022-06-25T15:52:10Z

I mentioned in my PR that the logging implementation is quite slow if milliseconds matter

Oh, and yes, for our purposes we're trying to get under 10ms per full graphics render cycle on interactions with >= 1M datums..so yeah it kinda does matter. I'm already dealing with a bit much skew due to the current overhead latency this original impl contains 😂

Details of the original patch to upstream are in: pyqtgraph/pyqtgraph#2281 Instead of trying to land this we've opted to just copy out that version of `.debug.Profiler` into our own internals (luckily the class is entirely self-contained) until such a time when we choose to find a better dependency as per #337

Add a gt flag to profiler to filter blocks "slower then" a ms value

7cb99c6

goodboy mentioned this pull request Apr 29, 2022

"Big data" lines pikers/piker#289

Merged

4 tasks

j9ac9k reviewed May 1, 2022

View reviewed changes

goodboy requested a review from j9ac9k May 13, 2022 19:59

goodboy changed the title ~~Add a gt flag to profiler to filter blocks "slower then" a ms value~~ Add a ms_threshold flag to profiler: filter blocks "slower then" a ms value, fix *recursive profiling* May 13, 2022

goodboy changed the title Add a ms_threshold flag to profiler: filter blocks "slower then" a ms value, fix *recursive profiling* Add a ms_threshold flag to profiler: filter blocks "slower then" a ms value, fix recursive profiling May 13, 2022

ntjess reviewed Jun 2, 2022

View reviewed changes

ntjess mentioned this pull request Jun 3, 2022

Allow profiler to utilize logging capabilities #2322

Draft

goodboy mentioned this pull request Jun 7, 2022

Seriously, let's profile. pikers/piker#337

Open

goodboy mentioned this pull request Nov 10, 2022

Syncing with pyqtgraph upstream pikers/piker#416

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a `ms_threshold` flag to profiler: filter blocks "slower then" a ms value, fix recursive profiling #2281

Add a `ms_threshold` flag to profiler: filter blocks "slower then" a ms value, fix recursive profiling #2281

goodboy commented Apr 29, 2022 •

edited

j9ac9k May 1, 2022 •

edited

goodboy May 13, 2022

j9ac9k commented May 1, 2022

goodboy commented May 13, 2022

goodboy commented May 13, 2022

ntjess Jun 2, 2022

goodboy Jun 6, 2022

ntjess commented Jun 3, 2022 •

edited

goodboy commented Jun 6, 2022

ntjess commented Jun 6, 2022 •

edited

goodboy commented Jun 7, 2022

goodboy commented Jun 7, 2022

goodboy commented Jun 7, 2022

goodboy commented Jun 25, 2022

goodboy commented Jun 25, 2022

Add a ms_threshold flag to profiler: filter blocks "slower then" a ms value, fix recursive profiling #2281

Are you sure you want to change the base?

Add a ms_threshold flag to profiler: filter blocks "slower then" a ms value, fix recursive profiling #2281

Conversation

goodboy commented Apr 29, 2022 • edited

Upate: new changes to aid with recursive profiling,

Some notes (old)

TODO:

j9ac9k May 1, 2022 • edited

Choose a reason for hiding this comment

goodboy May 13, 2022

Choose a reason for hiding this comment

j9ac9k commented May 1, 2022

goodboy commented May 13, 2022

goodboy commented May 13, 2022

ntjess Jun 2, 2022

Choose a reason for hiding this comment

goodboy Jun 6, 2022

Choose a reason for hiding this comment

ntjess commented Jun 3, 2022 • edited

goodboy commented Jun 6, 2022

ntjess commented Jun 6, 2022 • edited

goodboy commented Jun 7, 2022

goodboy commented Jun 7, 2022

goodboy commented Jun 7, 2022

goodboy commented Jun 25, 2022

goodboy commented Jun 25, 2022

Add a `ms_threshold` flag to profiler: filter blocks "slower then" a ms value, fix recursive profiling #2281

Add a `ms_threshold` flag to profiler: filter blocks "slower then" a ms value, fix recursive profiling #2281

goodboy commented Apr 29, 2022 •

edited

j9ac9k May 1, 2022 •

edited

ntjess commented Jun 3, 2022 •

edited

ntjess commented Jun 6, 2022 •

edited