You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The dataset is consist of video and each video has one class(target).
The video is captured by frame and the captured image is the input of the model.
So can I ask the evaluation method of video classification?
When evaluating the video classification model, I have to measure the accuracy of the label when one video is input?
Or I have to measure the accuracy of each frame of video when each frame is input?
The text was updated successfully, but these errors were encountered:
Take CRNN for example. The input to the model is in form of 28x3x224x224 where 28 is the number of frames extracted from a video, 3 is the number of channels and 224x224 is the resized frame, from the original video. the target for this input is 1 label. As explained in the readme, the CNN (encoder) takes in this input and generates and encoding (feature vector) and passes it to RNN (decoder) which takes into account the temporal resolution of the video.
The dataset is consist of video and each video has one class(target).
The video is captured by frame and the captured image is the input of the model.
So can I ask the evaluation method of video classification?
When evaluating the video classification model, I have to measure the accuracy of the label when one video is input?
Or I have to measure the accuracy of each frame of video when each frame is input?
The text was updated successfully, but these errors were encountered: