You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Consider a scenario where AI services are being provided on devices with limited memory capacity.
Since user requests do not occur continuously, it may be inefficient to keep loading the model onto memory.
This is more necessary when working with larger sized models (e.g., Large Language Models).
Thus, rather than maintaining the model loaded onto memory, it is more efficient to recall and utilize the model as needed.
For instance, if there are no requests to the model for 3 seconds, then the model will be removed from memory.
This issue can be resolved by adding these events to the tensor filter.
By using SUSPEND and RESUME events, we can unload or reload models from memory in sub-plugins, except for core functions.
If the sub-plugin cannot handle SUSPEND and RESUME events, it is possible to close the sub-plugin and open it again.
Users can manage suspend and resume event of tensor filter, but they would also need an automatic management feature later.
--- a/gst/nnstreamer/include/nnstreamer_plugin_api_filter.h
+++ b/gst/nnstreamer/include/nnstreamer_plugin_api_filter.h
@@ -181,6 +181,8 @@ typedef enum
SET_OUTPUT_PROP, /**< Update output tensor info and layout */
SET_ACCELERATOR, /**< Update accelerator of the subplugin to be used as backend */
CHECK_HW_AVAILABILITY, /**< Check the hw availability with custom option */
+ SUSPEND, /**< Unload the model file from memory */
+ RESUME /**< Load the model on memory */
} event_ops;
The text was updated successfully, but these errors were encountered:
Add suspend and resume events to tensor filter
Consider a scenario where AI services are being provided on devices with limited memory capacity.
Since user requests do not occur continuously, it may be inefficient to keep loading the model onto memory.
This is more necessary when working with larger sized models (e.g., Large Language Models).
Thus, rather than maintaining the model loaded onto memory, it is more efficient to recall and utilize the model as needed.
For instance, if there are no requests to the model for 3 seconds, then the model will be removed from memory.
This issue can be resolved by adding these events to the tensor filter.
By using SUSPEND and RESUME events, we can unload or reload models from memory in sub-plugins, except for core functions.
If the sub-plugin cannot handle SUSPEND and RESUME events, it is possible to close the sub-plugin and open it again.
Users can manage suspend and resume event of tensor filter, but they would also need an automatic management feature later.
The text was updated successfully, but these errors were encountered: