[WIP] Put the output of the model into existing NDArrays when provided #618

zemei · 2021-02-06T01:54:48Z

Introducing an optional output NDList parameter on the Predictor::predict interface. When the output parameter is provided the underlying engine will copy the inference result into the corresponding NDArray instead of creating new objects.

This functionality is useful for high throughput systems as it reduces the number of memory allocations and reduces the load on the garbage collector.

TODO:

The engine specific implementation is only provided for PyTorch:
- On the Java side, the copy is only implemented for the NDArray type.
- The PyTorch lib side was refactored since and not included in the PR.

frankfliu · 2021-02-08T01:12:10Z

In general, this is a good idea to reuse the output NDArrays, but define NDList at Predictor is debatable.

I think, this can be implemented at Translator level instead, and we can document how to reuse the NDList in Translator implementation.

The outputs support at Block level is necessary to support reuse the NDList, but may not work for all block, we can add TODO to some of the Block implementation to make it reusable.

lanking520 · 2021-02-09T19:13:04Z

@frankfliu @zemei Maybe we can try to tweak around on the ctx through implementing a cached NDArray holder. User are responsible to control the life-cycle of it.

public O processInput(Context ctx, I input) {
      NDArray array = ctx.getCachedNdArray("name");
}

This CachedNDArray will not be vanished until the end of Model life cycle. But there is a risk for memory leak if use misuse this component.

frankfliu · 2024-04-27T21:07:55Z

@zemei

Are you still interested in this PR? I took a close look today. The idea to reuse input/output tensor is great, however I don't think this implementation is optimal:

This optimization is only needed for inference, we don't really need touch so many classes, we only need add Predictor.setOutputTensor(NDList)
In this PR, the output tensor is still created in JNI, and copied to existing NDArray, I don't think this saved anything. I actually don't know if it's possible to re-use output tensor in libtorch.
I feel re-use input tensor might be a more feasible approach

Put the output of the model into existing NDArrays when provided

612acf0

lanking520 assigned stu1130 Mar 2, 2021

lanking520 changed the title ~~Put the output of the model into existing NDArrays when provided~~ [WIP] Put the output of the model into existing NDArrays when provided Mar 2, 2021

Lokiiiiii pushed a commit to Lokiiiiii/djl that referenced this pull request Oct 10, 2023

remove hanging streaming test case (deepjavalibrary#618)

c93f22a

zemei requested review from zachgk, frankfliu and a team as code owners April 26, 2024 19:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Put the output of the model into existing NDArrays when provided #618

[WIP] Put the output of the model into existing NDArrays when provided #618

zemei commented Feb 6, 2021

frankfliu commented Feb 8, 2021

lanking520 commented Feb 9, 2021

frankfliu commented Apr 27, 2024

[WIP] Put the output of the model into existing NDArrays when provided #618

Are you sure you want to change the base?

[WIP] Put the output of the model into existing NDArrays when provided #618

Conversation

zemei commented Feb 6, 2021

frankfliu commented Feb 8, 2021

lanking520 commented Feb 9, 2021

frankfliu commented Apr 27, 2024