You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I find it really interesting in your code for better understanding both 3D and CNN + LSTM architecture. However, I think there is a small problem when you handling various lengths in the LSTM part. As we have some videos with minimum of 28 frames and you have padded it to make sure they are all have 50 frames. However, when you decode the LSTM hidden units, you take the last frame: https://github.com/HHTseng/video-classification/blob/master/ResNetCRNN_varylength/functions.py#L276
which will be zeros in these cases.
I think we have to rely on the second output of torch.nn.utils.rnn.pad_packed_sequence to decide which timestep to decode for classification,
Please let me know your opinion,
The text was updated successfully, but these errors were encountered:
Hello @HHTseng,
I find it really interesting in your code for better understanding both 3D and CNN + LSTM architecture. However, I think there is a small problem when you handling various lengths in the LSTM part. As we have some videos with minimum of 28 frames and you have padded it to make sure they are all have 50 frames. However, when you decode the LSTM hidden units, you take the last frame: https://github.com/HHTseng/video-classification/blob/master/ResNetCRNN_varylength/functions.py#L276
which will be zeros in these cases.
I think we have to rely on the second output of torch.nn.utils.rnn.pad_packed_sequence to decide which timestep to decode for classification,
Please let me know your opinion,
The text was updated successfully, but these errors were encountered: