Add suspend and resume events to tensor filter #4424

gichan-jang · 2024-03-18T09:19:09Z

Add suspend and resume events to tensor filter

Consider a scenario where AI services are being provided on devices with limited memory capacity.
Since user requests do not occur continuously, it may be inefficient to keep loading the model onto memory.
This is more necessary when working with larger sized models (e.g., Large Language Models).
Thus, rather than maintaining the model loaded onto memory, it is more efficient to recall and utilize the model as needed.
For instance, if there are no requests to the model for 3 seconds, then the model will be removed from memory.
This issue can be resolved by adding these events to the tensor filter.
By using SUSPEND and RESUME events, we can unload or reload models from memory in sub-plugins, except for core functions.
If the sub-plugin cannot handle SUSPEND and RESUME events, it is possible to close the sub-plugin and open it again.
Users can manage suspend and resume event of tensor filter, but they would also need an automatic management feature later.

--- a/gst/nnstreamer/include/nnstreamer_plugin_api_filter.h
+++ b/gst/nnstreamer/include/nnstreamer_plugin_api_filter.h
@@ -181,6 +181,8 @@ typedef enum
   SET_OUTPUT_PROP,  /**< Update output tensor info and layout */
   SET_ACCELERATOR,  /**< Update accelerator of the subplugin to be used as backend */
   CHECK_HW_AVAILABILITY, /**< Check the hw availability with custom option */
+  SUSPEND, /**< Unload the model file from memory */
+  RESUME /**< Load the model on memory */
 } event_ops;

The text was updated successfully, but these errors were encountered:

taos-ci · 2024-03-18T09:19:12Z

cibot: Thank you for posting issue #4424. The person in charge will reply soon.

myungjoo · 2024-03-28T08:52:46Z

We need to clarify the policy and the behaviors of SUSPEND/RESUME.

I presume the followings:

They are NOT nested. {Call SUSPEND; Call SUSPEND; Call RESUME} == {Call SUSPEND; Call RESUME}
Add a mutex around fw->handleEvent() call at tensor_filter_common.c. (SUSPEND/RESUME will incur synchronization problems.)
More description on how to implement SUSPEND/RESUME event handlers and the purpose of such events.
The behaviors when other callbacks are called while it is in "SUSPEND" state.
- Option A: call RESUME automatically before the callback is actually called. (you need to manage status)
- Option B: explicit error (you need to manage status) that may break the pipeline
- Option C: explicit error (message and event only) without breaking the pipeline.
- Option D: do nothing (unexpected behavior)
- Option E: let it flow with dummy output or empty output.
- ... (what would be the best choice?)
Will any entity manage SUSPEND/RESUME status?

E.g.,

+  SUSPEND, /**< Release resources temporarily until RESUME is called. You may conserve free memory of large neural network models without EOS.  */

jaeyun-jung added discussion backlog labels Mar 18, 2024

niley7464 mentioned this issue Mar 18, 2024

Use tensor filter framework v1 in nnstreamer_plugin_api_filter #4425

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add suspend and resume events to tensor filter #4424

Add suspend and resume events to tensor filter #4424

gichan-jang commented Mar 18, 2024 •

edited

taos-ci commented Mar 18, 2024

myungjoo commented Mar 28, 2024

Add suspend and resume events to tensor filter #4424

Add suspend and resume events to tensor filter #4424

Comments

gichan-jang commented Mar 18, 2024 • edited

taos-ci commented Mar 18, 2024

myungjoo commented Mar 28, 2024

gichan-jang commented Mar 18, 2024 •

edited