You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I extracted video features by using this pre-trained model (resnet-34-kinetics-cpu.pth) and I checked the outputs that the dimension of extracted features for each segment (16 frames) is 512 dims. However, in your paper, for this model, it should be 512/2=256 dims after global average pooling. Please correct me if I am wrong.
For the pre-trained models provided by you, there are "resnet-34-kinetics-cpu.pth" and "resnext-101-kinetics.pth". I would ask - why is the latter model size smaller than the former's? To my understanding, the latter model should have more parameters to be trained (more filters/feature channels).
Looking forward to your reply. Thanks in advance.
The text was updated successfully, but these errors were encountered:
I found something. In ResNeXt-101 model, it seems like (conv, 1x1x1, F) --> (conv, 3x3x3, F, group=32) --> (conv, 1x1x1, 2F) (ignore BN and ReLU). The output dimension of downsample block is 1024-D (F=1024). However, one more conv layer is followed with double dimension to downsample block output, which means final output dimension before global average pooling (GAP) layer is 2F (2*1024) in this case. And GAP will not change feature dimensions.
So, by extracting video features from resnext-101 model, the output dimensions after GAP is 2048-D.
One more thing. In author's readMe file, say that "In the feature mode, this code outputs features of 512 dims (after global average pooling) for each 16 frames.". This case is for ResNet basic version regardless the number of layers.
Hi, thanks for sharing codes.
Two questions here:
I extracted video features by using this pre-trained model (resnet-34-kinetics-cpu.pth) and I checked the outputs that the dimension of extracted features for each segment (16 frames) is 512 dims. However, in your paper, for this model, it should be 512/2=256 dims after global average pooling. Please correct me if I am wrong.
For the pre-trained models provided by you, there are "resnet-34-kinetics-cpu.pth" and "resnext-101-kinetics.pth". I would ask - why is the latter model size smaller than the former's? To my understanding, the latter model should have more parameters to be trained (more filters/feature channels).
Looking forward to your reply. Thanks in advance.
The text was updated successfully, but these errors were encountered: