Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About your dataset #2

Open
IVparagigm opened this issue Aug 20, 2020 · 4 comments
Open

About your dataset #2

IVparagigm opened this issue Aug 20, 2020 · 4 comments

Comments

@IVparagigm
Copy link

Hi, because I'm a green hands and I could not get Kinetics dataset, I only can read your code. So there are some questions:
(1) In videodataset.py, class videodataset return a clip and a target in training. I notice that the length of clip is equal to the length of frame_indices which is 10, but in your paper, you select 32 frames as input. So could you tell me where you select 32 frames?
(2)About strg.py, I test other size of input like 1332224224, whose batch size is 1 and depth is 32, but the batch size of output of extractor and reducer is 2. If that means I must process all my input with batch_size = 4? Could I use other size of input?
All problems above are just primary but bothering me for several days. I would be very grateful to you if you could help me.
Thanks.

@jd730
Copy link
Owner

jd730 commented Aug 20, 2020

Hi @IVparagigm,
1-1) Could you tell me which script did you run? on my side, the length of frame_indices is 32.
1-2) Also, I am not the author of the paper.

  1. I cannot understand the size of your input. Maybe there is a font mismatch. does it 1 * 3 * 32 * 224 * 224?

The batch_size is preserved.

image

@IVparagigm
Copy link
Author

1)in Videodataset.py class Videodataset _make_dataset()
frame_indices = list(range(segment[0], segment[1])) sample = { 'video': video_path, 'segment': segment, 'frame_indices': frame_indices, 'video_id': video_ids[i], 'label': label_id }
And I got the annotation like this:
"---QUuC4vJs": { "annotations": { "label": "testifying", "segment": [ 84.0, 94.0 ] },
so I consider the length of frame indices is 10. I'm sorry I can't got dataset so I just read your code to infer the size of data.

2)Yes, I first use github, so there are something wrong with my comment format. This is my test code:
rois = torch.rand((1, 32, 10, 4)) inputs = torch.rand((1, 3, 32, 224, 224)) strg = STRG(model) out = strg(inputs, rois)
When I use this, I got the feauture with 2 * 512 * 7 * 7, but my expectation was getting a output with size of 32 * 512 * 7 * 7 beacuse the depth of input is 32(I sent 32 frames into network). Could you tell me what's wrong with me.

Tnank you very much if you could help me.

@jd730
Copy link
Owner

jd730 commented Aug 20, 2020

@IVparagigm

  1. If I remember correctly, that one is from https://github.com/kenshohara/3D-ResNets-PyTorch/blob/master/datasets/videodataset.py and it is for dense-sampling. it finally samples 32 frames on the next stages.
  2. what is model in your code? Is it resnet_strg correctly?

@IVparagigm
Copy link
Author

1)Yes, what I metioned is in this file.
frame_indices = self.data[index]['frame_indices'] if self.temporal_transform is not None: frame_indices = self.temporal_transform(frame_indices) clip = self.__loading(path, frame_indices)
The self.temporal_transform make frame_indices into the length of 32, right?

2)resnet_strg and resnet both I have tried.
In resnet_strg, after self.reducer I got the output with size of 16 * 512 * 14 * 14. If output with size of 32 * 512 * 7 * 7 is what I want. Is it achievable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants