Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

traces include time spent in garbage collector #389

Open
richvdh opened this issue May 6, 2021 · 8 comments · May be fixed by #515
Open

traces include time spent in garbage collector #389

richvdh opened this issue May 6, 2021 · 8 comments · May be fixed by #515

Comments

@richvdh
Copy link

richvdh commented May 6, 2021

Often, when I record a trace, it will contain one or two lines with very large (and not entirely believable) sample counts. I suspect this is because the garbage collector is kicking in.

Whilst it's in theory useful to know that the garbage collector is using CPU time, it's not terribly helpful from the point of view of looking for parts of my app I can optimise.

Would it be possible to detect that the GC is active, and omit such samples from the report?

@benfred
Copy link
Owner

benfred commented May 6, 2021

If it's GC time you might be able to see it by enabling native stack traces - we include certain python functions like GC in the stack traces.

Do you have a script that reproduces? Python objects are ref-counted and the GC will only kick in if you have cyclic references to objects. I'm wondering if this is caused by a different problem like GIL contention instead of GC - unless you're allocating a ton of objects with cyclic references there shouldn't be a large GC pause.

Also worth checking out https://twitter.com/tmr232/status/1387049432319873035 =)

@richvdh
Copy link
Author

richvdh commented May 6, 2021

yeah, I'm certainly not ruling out user error here :)

I've got a long-running application, and one reason to suspect GC is that, each time I profile it, I see it getting "stuck" at a completely different point in the code.

-n doesn't seem to show anything particularly interesting.

Another datapoint: I'm running with -g (since my application uses threads and if I don't use -g I get hundreds of samples from threads waiting for a semaphore). I'm not sure what bearing that might have on whether I might expect to see GC time in the reports.

@benfred
Copy link
Owner

benfred commented May 6, 2021

hmm - that does sound suspicious.

Can you try disabling GC altogether and see if the problem still happens? Instagram does this https://instagram-engineering.com/dismissing-python-garbage-collection-at-instagram-4dca40b29172 and found some benefits.

I'm wondering if you call something like

import gc
gc.disable()
gc.set_threshold(0)

and then you don't get stuck at any point you can say with some confidence that its GC related.

@richvdh
Copy link
Author

richvdh commented May 7, 2021

yep. Of course, doing so requires restarting the process, which changes things somewhat. Still, I've tried it, and it certainly looks like the results are more consistent with what I would expect.

@richvdh
Copy link
Author

richvdh commented May 7, 2021

it also makes my memory usage do this, so I think the answer to "unless you're allocating a ton of objects with cyclic references" is "yes":

image

@benfred
Copy link
Owner

benfred commented May 8, 2021

yup - that definitely looks like it is GC issues.

I don't quite know how to figure out if the target process is in a GC cycle right now - this might be pretty tricky to implement =(

@richvdh
Copy link
Author

richvdh commented May 11, 2021

I don't quite know how to figure out if the target process is in a GC cycle right now - this might be pretty tricky to implement =(

This might be naive, but you already inspect the _PyRuntime to see if a thread has the GIL. There is likewise a _PyRuntime.gc.collecting which is set to 1 while the GC is running (https://github.com/python/cpython/blob/bb3e0c240bc60fe08d332ff5955d54197f79751c/Modules/gcmodule.c#L2266 etc), which might serve the purpose here?

@benfred
Copy link
Owner

benfred commented Aug 4, 2022

There is a prototype of this at #515 - this will display the percentage of samples in the record command spent in GC, and also has an option to exclude samples in GC from the record output.

Note that this currently on workings on x86_64 linux/mac , with python 3.9+ and doesn't work for profiling subprocesses. If you want to try it out, you can download the wheels from the github actions artifacts here https://github.com/benfred/py-spy/actions/runs/2794391704

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants