Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak in zeroGradients() #2792

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

pdradx
Copy link
Contributor

@pdradx pdradx commented Sep 29, 2023

Code in zeroGradients() of NDArray doesn't close gradient tensor after use.
The getGradient() method shares internally memory for each call, but for DJL it creates new handle every time and creates new instance of NDArray. So we must close it after use.

There is some other problem with this code which I don't fixed yet.
Zeroing tensor by subtracting himself is not working for NaNs and Infinities.
So if we receive NaN or Infinity in gradient we will not be possible to recover from that error by calling zeroGradients. We must set array instead of subtracting from it, but semantics of set() methods doesn't allow universally work with NDArray of diffrent shapes and single scallars.
Foe example array.set("*", 0) works for arrays of all shapes, but don't works with single scalar shape without dimensions.

So this commit fixes memory leak only.

SidneyLann and others added 4 commits September 19, 2023 17:36
---------

Co-authored-by: Administrator <Administrator@tech8>
Co-authored-by: KexinFeng <fenkexin@amazon.com>
* Implement PtNDArraryEx.multiboxDetection

* MultiboxDetection - code cleanup

* MultiboxDetection - code cleanup

* MultiboxDetection - code cleanup

* MultiboxDetection - code cleanup

* format code

* Fix, add tests, and pass CI

---------

Co-authored-by: Zach Kimberg <kimbergz@amazon.com>
@pdradx pdradx requested review from zachgk, frankfliu and a team as code owners September 29, 2023 03:28
@codecov-commenter
Copy link

codecov-commenter commented Sep 29, 2023

Codecov Report

Attention: 1375 lines in your changes are missing coverage. Please review.

Comparison is base (bb5073f) 72.08% compared to head (c668fda) 72.23%.
Report is 886 commits behind head on master.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #2792      +/-   ##
============================================
+ Coverage     72.08%   72.23%   +0.14%     
- Complexity     5126     7113    +1987     
============================================
  Files           473      702     +229     
  Lines         21970    31682    +9712     
  Branches       2351     3284     +933     
============================================
+ Hits          15838    22884    +7046     
- Misses         4925     7236    +2311     
- Partials       1207     1562     +355     
Files Coverage Δ
...ava/ai/djl/inference/streaming/StreamingBlock.java 100.00% <100.00%> (ø)
api/src/main/java/ai/djl/metric/Dimension.java 100.00% <100.00%> (ø)
api/src/main/java/ai/djl/metric/Unit.java 100.00% <100.00%> (ø)
api/src/main/java/ai/djl/modality/audio/Audio.java 100.00% <100.00%> (ø)
api/src/main/java/ai/djl/modality/cv/Image.java 69.23% <ø> (-4.11%) ⬇️
...rc/main/java/ai/djl/modality/cv/MultiBoxPrior.java 76.00% <ø> (ø)
...ava/ai/djl/modality/cv/output/DetectedObjects.java 96.29% <100.00%> (+1.29%) ⬆️
...rc/main/java/ai/djl/modality/cv/output/Joints.java 71.42% <100.00%> (ø)
.../main/java/ai/djl/modality/cv/output/Landmark.java 100.00% <ø> (ø)
...i/djl/modality/cv/transform/RandomResizedCrop.java 94.11% <100.00%> (+5.22%) ⬆️
... and 226 more

... and 372 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pdradx pdradx changed the title Fixes memory leak in zeroGradients() Fix memory leak in zeroGradients() Sep 29, 2023
@zachgk
Copy link
Contributor

zachgk commented Sep 29, 2023

If this is a memory leak, wouldn't it be happening in all usages of getGradient? I don't think that is the semantics we were going for with that function.

Maybe a better approach might be to cache the gradient as a property of the MxNDArray and PtNDArray. Something like:

NDArray getGradient() {
  if (this.gradient != null) {
    return this.gradient;
  }
  NDArray gradient = ...
  this.gradient = gradient;
  return gradient;
}

That way, it will fix this memory leak along with other possible gradient ones. As part of this, it may also want to close the gradient NDArray when closing the main NDArray. Although it is likely to be closed anyway by the NDManager.

As for the setting part, try with array.set("", 0). I feel like there was some error that was happening with that, but I could be misremembering.

@pdradx
Copy link
Contributor Author

pdradx commented Oct 2, 2023

Yes, of course! All usages of getGradient() must be closed after...
And for array.set("", 0) it will creates an empty NDIndex under the hood, which doesn't work with scalar NDArrays.

@pdradx
Copy link
Contributor Author

pdradx commented Oct 2, 2023

But changing semantic of getGradient() now will require to fix all other parts, which already use such behaviour and closes gradients.

@KexinFeng
Copy link
Contributor

KexinFeng commented Oct 5, 2023

I don't know how relevant this is, but here is a previous solution regarding memory management: #2567
#2273

try(NDScope scope = new NDScope()){
  scope.suppressNotUsedWarning();
  ...
  NDSope.unregister(NDArrays_to_keep)
}

Example implementation: #2637

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants