Replies: 3 comments
-
In TorchSharp, Niklas has been looking at these issues a lot, and the same lessons will apply to DiffSharp He adds an explicit GC.Collect in the training loop, e.g. here: https://github.com/xamarin/TorchSharp/blob/master/src/FSharp.Examples/AlexNet.fs#L124 And yes, using Batch GC will also surely help too He also adds explicit |
Beta Was this translation helpful? Give feedback.
-
Thank you for the kind GC.Collect() tip. Doing it on each batch loop iteration after For future users: GC.Collect() does not cause a visible drop in GPU memory usage shown in Windows task manager. But it works and does prevent new allocations. |
Beta Was this translation helpful? Give feedback.
-
Cool good to know, we should update the DiffSharp samples to include this as a matter of course as well |
Beta Was this translation helpful? Give feedback.
-
Summary
Should there be a recommendation to avoid the .NET server garbage collection setting if you want to use diffsharp with a GPU?
I typically use "System.GC.Server":true with fsi for parallel CPU performance, but with this setting unused tensors do not get collected very often. That's a problem for the GPU; I kept getting Cuda out of memory errors training a model. The problem goes away when I use the (default) workstation garbage collection setting.
Code to reproduce the issue is below. The summary:
Further related reading for anybody else who finds themselves in this situation: https://github.com/xamarin/TorchSharp/blob/master/docfx/articles/memory.md
Reproduction
This code below will run with "workstation" garbage collection and crash with "server" garbage collection.
To switch between garbage collection settings, you need to modify
fsi.runtimeconfig.json
. On windows I find this file inC:\Program Files\dotnet\sdk\5.0.300-preview.21258.4\FSharp
.Beta Was this translation helpful? Give feedback.
All reactions