Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a ms_threshold flag to profiler: filter blocks "slower then" a ms value, fix recursive profiling #2281

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

goodboy
Copy link

@goodboy goodboy commented Apr 29, 2022

I found this super useful for narrowing down latency issues (from slowest to fastest) much easier then having a whole sale or "tag based" selection of sections enabled.

If noone is interested in this I'll stick with our ongoing approach of pulling into our dependent project and y'all can come ask for it later if wanted 🖖🏼

Upate: new changes to aid with recursive profiling,

Taken from latest commit msg:
Further this commit makes some larger alterations:

Details, in order from bullets above:

  • when a top level (aka ._depth == 0) profiler was deleted by the GC,
    all messages from all underling (aka recursive) profilers were also
    wiped, meaning you'd get a weird case where only the "Exit" msg was
    showing. Now we instead expect each instance to clear it's own msg
    set. There should also be no further reason for the ._depth < 1
    check since each instance unwinds and emits its own messages,
    presumably as the call stack unwinds via the GC.
  • I see no reason to have the .flush() when it was a trivial method
    that can be inlined to .finish() and I doubt there's much of a use
    case for "incrementally flushing" unless the user is using one "god
    profiler" throughout their entire stack..

Some notes (old)

  • found the .__del__() finishing to be highly unreliable now solved by making ._msgs and instance var as per above (particularly in async code driving sections with profiling)
  • "nested profiling" seems to be something really handy but i'm still not entirely sure how it works; do you need to pass in the caller's instance to make it work or should it work with new instances always? seems to work great now, even with using the new ms_threshold param after the latest patch.

TODO:

  • should we add a context manager interface to this api?

@goodboy goodboy mentioned this pull request Apr 29, 2022
4 tasks
@@ -535,7 +535,7 @@ def mark(self, msg=None):
pass
_disabledProfiler = DisabledProfiler()

def __new__(cls, msg=None, disabled='env', delayed=True):
def __new__(cls, msg=None, disabled='env', delayed=True, gt: float = 0):
Copy link
Member

@j9ac9k j9ac9k May 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably should make the default value be 0. or 0.0 if it's going to be type-annotated to a float.

>>> a = 0
>>> isinstance(a, float)
False

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, are we cool with introducing type annots now?

@j9ac9k
Copy link
Member

j9ac9k commented May 1, 2022

Hi @goodboy Thanks for the PR. The feature looks good to me, I agree with its usefulness.

I would suggest renaming the keyword argument from gt to threshold or something along those lines.

@goodboy
Copy link
Author

goodboy commented May 13, 2022

@j9ac9k sounds good!

I have a couple more commits to address issues with nested profilers that i'm going to append here. I will also make the suggested changes 🏄🏼

This is a more explicit input argument name which we assign to an
internal `._mt` instance var. This patch also includes some improvements
for recursive profiling, see below.

Further this commit makes some larger alterations:
- drops the class variable `._msgs` since it was causing issues on
  nested profiler use. My presumption is that this code was written with
  the expectation that stacks could unwind (per thread) out of order?
  I don't see why else you'd have `.flush()` only print on a `._depth
  < 1` condition..
- drops `.flush()`, since `.finish()` is the only place this method is
  called (even when scanning the rest of the code base) thus dropping
  a call frame.
- formatting adjustments around this `class` as per `flake8` linter.

Details, in order from bullets above:
- when a top level (aka `._depth == 0`) profiler was deleted by the GC,
  all messages from all underling (aka recursive) profilers were also
  wiped, meaning you'd get a weird case where only the "Exit" msg was
  showing. Now we instead expect each instance to clear it's own msg
  set. There should also be no further reason for the `._depth < 1`
  check since each instance unwinds and emits it's own messages,
  presumably as the call stack unwinds via the GC.
- I see no reason to have the `.flush()` when it was a trivial method
  that can be inlined to `.finish()` and I doubt there's much of a use
  case for "incrementally flushing" unless the user is using one "god
  profiler" throughout their entire stack..
@goodboy
Copy link
Author

goodboy commented May 13, 2022

@j9ac9k changed the name and added more changes that you'll likely want to audit 😂

@goodboy goodboy requested a review from j9ac9k May 13, 2022 19:59
@goodboy goodboy changed the title Add a gt flag to profiler to filter blocks "slower then" a ms value Add a ms_threshold flag to profiler: filter blocks "slower then" a ms value, fix *recursive profiling* May 13, 2022
@goodboy goodboy changed the title Add a ms_threshold flag to profiler: filter blocks "slower then" a ms value, fix *recursive profiling* Add a ms_threshold flag to profiler: filter blocks "slower then" a ms value, fix recursive profiling May 13, 2022
msg=None,
disabled='env',
delayed=True,
ms_threshold: float = 0.0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pijyoi mentioned some methods/args are snake-cased, but only to indicate they are not part of the explicit API. Not to bikeshed too much, but would it be better to call this msThreshold (or thresholdMs if unit suffixes typically go at the end of a number)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly am indifferent but personally don't love the camel case stuff.
It's up to y'all.

@ntjess
Copy link
Contributor

ntjess commented Jun 3, 2022

@goodboy Since _msgs is now an object variable, your code prints nested functions in reverse order:

def test_profiler():
    """
    Test the profiler by creating a profiler and calling it.
    """
    import time
    logger.setLevel(logging.DEBUG)
    profiler = Profiler("test", disabled=False)
    def nested_func():
        profiler = Profiler("nested", disabled=False)
        time.sleep(0.1)
        profiler("after nested sleep")
    
    profiler()
    nested_func()
    profiler()
    profiler()

Produces:

  > Entering nested
    after nested sleep: 100.2726 ms
  < Exiting nested, total time: 100.3018 ms
> Entering test
  0: 0.0114 ms
  1: 102.0797 ms
  2: 0.0832 ms
< Exiting test, total time: 102.1887 ms

I'm also a little confused at the issue you reported with recursive profiling, can you give me a small code example to understand? I tried the current Profiler implementation with the following and didn't face any issues:

def fib_bad_recursive(n):
    """
    A bad recursive implementation of fibonacci.
    """
    if n < 2:
        return n
    profiler = Profiler("fib_bad_recursive", disabled=False)
    return fib_bad_recursive(n-1) + fib_bad_recursive(n-2)

def fib_memo(n, memo={}):
    """
    A memoized implementation of fibonacci.
    """
    if n < 2:
        return n
    profiler = Profiler("fib_memo", disabled=False)
    if n not in memo:
        memo[n] = fib_memo(n-1) + fib_memo(n-2)
    return memo[n]

if __name__ == '__main__':
    from pyqtgraph.debug import Profiler
    n = 10
    fib_bad_recursive(n)
    print('\n=====================\n')
    fib_memo(n)

Produced

> Entering fib_bad_recursive
  > Entering fib_bad_recursive
    > Entering fib_bad_recursive
      > Entering fib_bad_recursive
        > Entering fib_bad_recursive
        < Exiting fib_bad_recursive, total time: 0.0061 ms
      < Exiting fib_bad_recursive, total time: 0.0189 ms
      > Entering fib_bad_recursive
      < Exiting fib_bad_recursive, total time: 0.0038 ms
    < Exiting fib_bad_recursive, total time: 0.0421 ms
    > Entering fib_bad_recursive
      > Entering fib_bad_recursive
      < Exiting fib_bad_recursive, total time: 0.0032 ms
    < Exiting fib_bad_recursive, total time: 0.0140 ms
  < Exiting fib_bad_recursive, total time: 0.0766 ms
  > Entering fib_bad_recursive
    > Entering fib_bad_recursive
      > Entering fib_bad_recursive
      < Exiting fib_bad_recursive, total time: 0.0040 ms
    < Exiting fib_bad_recursive, total time: 0.0142 ms
    > Entering fib_bad_recursive
    < Exiting fib_bad_recursive, total time: 0.0033 ms
  < Exiting fib_bad_recursive, total time: 0.0345 ms
< Exiting fib_bad_recursive, total time: 0.1375 ms

=====================

> Entering fib_memo
  > Entering fib_memo
    > Entering fib_memo
      > Entering fib_memo
        > Entering fib_memo
        < Exiting fib_memo, total time: 0.0124 ms
      < Exiting fib_memo, total time: 0.0360 ms
      > Entering fib_memo
      < Exiting fib_memo, total time: 0.0060 ms
    < Exiting fib_memo, total time: 0.0772 ms
    > Entering fib_memo
    < Exiting fib_memo, total time: 0.0059 ms
  < Exiting fib_memo, total time: 0.1228 ms
  > Entering fib_memo
  < Exiting fib_memo, total time: 0.0059 ms
< Exiting fib_memo, total time: 0.1817 ms

@goodboy
Copy link
Author

goodboy commented Jun 6, 2022

I'm also a little confused at the issue you reported with recursive profiling, can you give me a small code example to understand? I tried the current Profiler implementation with the following and didn't face any issues:

Yeah it probably is confusing since in the client system we're using to call this code is all async driven and it seems the whole unwinding via garbage collection isn't playing great.

I couldn't even originally see half of the profile stack levels until i made this change. Not entirely sure why.

I can probably whip up something but honestly for such a small change as this we'll probably just end up copying this code out since it's getting resistance for merge yet again 😂

It looks like you already have something more extensive in #2322 anyway 😎

@ntjess
Copy link
Contributor

ntjess commented Jun 6, 2022

I wouldn't be opposed to the simpler solution you presented here (I mentioned in my PR that the logging implementation is quite slow if milliseconds matter)

The only thing giving me pause on this PR is the reverse order printout for the MWE I made. The hesitation is certainly not against this PR, I think it's a good idea 👍

we're using to call this code is all async driven

Do you mean async def functions or something else? It would be nice to replicate if indeed it is a bug/limitation in the current implementation, I'd be happy to help

@goodboy
Copy link
Author

goodboy commented Jun 7, 2022

See my comment in #2322 but I think we might just not worry too much about keeping this and go with a more "serious" solution in the longer run for our code base.

https://github.com/plasma-umass/scalene

I really just was using this profiler for graphics stuff, but honestly it would make more sense to be profiling things properly (even Qt C++ code) instead of this counter based stuff.

@goodboy
Copy link
Author

goodboy commented Jun 7, 2022

Do you mean async def functions or something else? It would be nice to replicate if indeed it is a bug/limitation in the current implementation, I'd be happy to help

Yes, we have a single-thread graphics loop entirely driven by async data flows from multiple processes.

@goodboy
Copy link
Author

goodboy commented Jun 7, 2022

The only thing giving me pause on this PR is the reverse order printout for the MWE I made. The hesitation is certainly not against this PR, I think it's a good idea +1

Sorry missed this. Yeah I mean if it's something obvious can you just revert what you think is causing it, PR it to this branch and then I'll give it a shot.

Sorry i'm not in graphics land at the moment but will definitely come back this!

Also thanks for all the feedback 😸

@goodboy
Copy link
Author

goodboy commented Jun 25, 2022

@ntjess just went back to this again and I think you're right. It's the change of ._msgs to an instance var that broke it.

Not sure why I couldn't get it working before (maybe other issues in the threshold logic before it was working properly?) but seems to be doing about what I'd expect now.

I'll push up this last little change and you can tell me if it matches your expectations 😎

@goodboy
Copy link
Author

goodboy commented Jun 25, 2022

I mentioned in my PR that the logging implementation is quite slow if milliseconds matter

Oh, and yes, for our purposes we're trying to get under 10ms per full graphics render cycle on interactions with >= 1M datums..so yeah it kinda does matter. I'm already dealing with a bit much skew due to the current overhead latency this original impl contains 😂

goodboy added a commit to pikers/piker that referenced this pull request Oct 31, 2022
Details of the original patch to upstream are in:
pyqtgraph/pyqtgraph#2281

Instead of trying to land this we've opted to just copy out that version
of `.debug.Profiler` into our own internals (luckily the class is
entirely self-contained) until such a time when we choose to find
a better dependency as per #337
goodboy added a commit to pikers/piker that referenced this pull request Nov 11, 2022
Details of the original patch to upstream are in:
pyqtgraph/pyqtgraph#2281

Instead of trying to land this we've opted to just copy out that version
of `.debug.Profiler` into our own internals (luckily the class is
entirely self-contained) until such a time when we choose to find
a better dependency as per #337
goodboy added a commit to pikers/piker that referenced this pull request Nov 17, 2022
Details of the original patch to upstream are in:
pyqtgraph/pyqtgraph#2281

Instead of trying to land this we've opted to just copy out that version
of `.debug.Profiler` into our own internals (luckily the class is
entirely self-contained) until such a time when we choose to find
a better dependency as per #337
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants