Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add suspend and resume events to tensor filter #4424

Open
gichan-jang opened this issue Mar 18, 2024 · 2 comments
Open

Add suspend and resume events to tensor filter #4424

gichan-jang opened this issue Mar 18, 2024 · 2 comments

Comments

@gichan-jang
Copy link
Member

gichan-jang commented Mar 18, 2024

Add suspend and resume events to tensor filter

Consider a scenario where AI services are being provided on devices with limited memory capacity.
Since user requests do not occur continuously, it may be inefficient to keep loading the model onto memory.
This is more necessary when working with larger sized models (e.g., Large Language Models).
Thus, rather than maintaining the model loaded onto memory, it is more efficient to recall and utilize the model as needed.
For instance, if there are no requests to the model for 3 seconds, then the model will be removed from memory.
This issue can be resolved by adding these events to the tensor filter.
By using SUSPEND and RESUME events, we can unload or reload models from memory in sub-plugins, except for core functions.
If the sub-plugin cannot handle SUSPEND and RESUME events, it is possible to close the sub-plugin and open it again.
Users can manage suspend and resume event of tensor filter, but they would also need an automatic management feature later.

--- a/gst/nnstreamer/include/nnstreamer_plugin_api_filter.h
+++ b/gst/nnstreamer/include/nnstreamer_plugin_api_filter.h
@@ -181,6 +181,8 @@ typedef enum
   SET_OUTPUT_PROP,  /**< Update output tensor info and layout */
   SET_ACCELERATOR,  /**< Update accelerator of the subplugin to be used as backend */
   CHECK_HW_AVAILABILITY, /**< Check the hw availability with custom option */
+  SUSPEND, /**< Unload the model file from memory */
+  RESUME /**< Load the model on memory */
 } event_ops;

@taos-ci
Copy link
Collaborator

taos-ci commented Mar 18, 2024

:octocat: cibot: Thank you for posting issue #4424. The person in charge will reply soon.

@myungjoo
Copy link
Member

We need to clarify the policy and the behaviors of SUSPEND/RESUME.

I presume the followings:

  1. They are NOT nested. {Call SUSPEND; Call SUSPEND; Call RESUME} == {Call SUSPEND; Call RESUME}
  2. Add a mutex around fw->handleEvent() call at tensor_filter_common.c. (SUSPEND/RESUME will incur synchronization problems.)
  3. More description on how to implement SUSPEND/RESUME event handlers and the purpose of such events.
  4. The behaviors when other callbacks are called while it is in "SUSPEND" state.
    • Option A: call RESUME automatically before the callback is actually called. (you need to manage status)
    • Option B: explicit error (you need to manage status) that may break the pipeline
    • Option C: explicit error (message and event only) without breaking the pipeline.
    • Option D: do nothing (unexpected behavior)
    • Option E: let it flow with dummy output or empty output.
    • ... (what would be the best choice?)
  5. Will any entity manage SUSPEND/RESUME status?

E.g.,

+  SUSPEND, /**< Release resources temporarily until RESUME is called. You may conserve free memory of large neural network models without EOS.  */

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants