New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question/Bug] Endless stream of GC events occupying a CPU core: What could be happening ? #101563
Comments
Tagging subscribers to this area: @dotnet/gc |
would you be able to collect a GCCollectOnly trace or a dump when the issue repros? Have you noticed this only after upgrading to 8? |
Oh yes sure, I'll try to do that next time I observe this behavior Regarding versions of .NET, I have no real point of comparison to tell if this behavior would have happened earlier. (While the code predates .NET 8.0, the app was not in a state to be really useful prior to a few months ago) |
I was going to suggest that you should get a GCCollectOnly trace with GCTriggeredStacks so you can actually see callstacks that triggered GCs. but I just found that it wasn't collecting the rundown events (which means you can't actually decode the managed call frames). so I just submitted a fix: 2031. obviously not available in perfview downloads yet. so I shared a private build at https://github.com/Maoni0/clrgc/tree/main/PerfView-bins if you like to try before the official release build is available.
|
Tagging subscribers to this area: @dotnet/gc |
Thanks for your suggestion of using perfview. I managed to reproduce the problem today and found that all GC events were caused by So, what can cause this pattern is apparently an infinite loop that generates small allocations. (The infinite loop was one bug; the allocations were another one) Sorry for consuming some of your time, but thanks a lot for the help ! 😊 |
Description
This is a scenario that I've observed a few times with the service I've developped here: https://github.com/hexawyz/Exo/tree/master/Exo.Service
After some run time, the service, which is mostly idle, will start consuming the equivalent of a whole CPU Core.
NB: This service uses Workstation GC instead of Server GC in order to reduce the memory footprint.
If the process is attached to Visual Studio, before or after this happens, Diagnostic Tools will show the following:
This behavior generally occurs after a quite long time (~2 days of run time for example, which is what lead to the screenshot above), but I've also observed it in a shorter session where it was started in Debug mode from within Visual Studio. In total, I believe I've observed this behavior about 4 times, so while the trigger seems to be somewhat random, the behavior is not.
The service is realively complex, and I don't excldue the possibility that this quirk in the GC could be triggered by some erroneous code in the service (I hope not). However, because I have observed this specific behavior multiple times, it seems not entirely random, so I believe it is worth asking for help here.
What I'm mainly asking here is if you would be aware of a situation that could trigger this behavior in the GC.
And also, what could I do to investigate this more in depth?
Reproduction Steps
This is how I reproduce the problem, but I don't ask that you run this, as it is not deterministic and the service might not work exactly the same on all computers.
Expected behavior
The service would stay at close to 0% CPU usage all the time, and there would not be a continuous stream of GC events
Actual behavior
At some point, the service will start consuming a lot of CPU, and it will be show as a continuous stream of GC events by VS diagnostic tools.
Regression?
No response
Known Workarounds
(Restart my service, but that's not very helpful)
Configuration
.NET SDK: 8.0.104
OS: Windows 11 23H2
Architecture: x64
GC: Workstation
Other information
The service makes use of a lot of low level or advanced features such as native function pointers,
GCHandle
,WeakReference
,DependentHandle
,AssemblyLoadContext
, etc. I don't believe using any of those should lead to the problem I observed in any case, but that might be worth knowing.Also, I strongly suspect that the instabilities could be triggered when the associated UI is interacting with the service (through named pipes), as the service is otherwise doing nothing most of the time. (Or at least, doing nothing as much as possible)
However, even if this happens to be the case, the trigger remains non deterministic.
The text was updated successfully, but these errors were encountered: