About your dataset #2

IVparagigm · 2020-08-20T08:23:32Z

Hi, because I'm a green hands and I could not get Kinetics dataset, I only can read your code. So there are some questions:
(1) In videodataset.py, class videodataset return a clip and a target in training. I notice that the length of clip is equal to the length of frame_indices which is 10, but in your paper, you select 32 frames as input. So could you tell me where you select 32 frames?
(2)About strg.py, I test other size of input like 1332224224, whose batch size is 1 and depth is 32, but the batch size of output of extractor and reducer is 2. If that means I must process all my input with batch_size = 4? Could I use other size of input?
All problems above are just primary but bothering me for several days. I would be very grateful to you if you could help me.
Thanks.

jd730 · 2020-08-20T11:16:05Z

Hi @IVparagigm,
1-1) Could you tell me which script did you run? on my side, the length of frame_indices is 32.
1-2) Also, I am not the author of the paper.

I cannot understand the size of your input. Maybe there is a font mismatch. does it 1 * 3 * 32 * 224 * 224?

The batch_size is preserved.

IVparagigm · 2020-08-20T11:48:53Z

1)in Videodataset.py class Videodataset _make_dataset()
frame_indices = list(range(segment[0], segment[1])) sample = { 'video': video_path, 'segment': segment, 'frame_indices': frame_indices, 'video_id': video_ids[i], 'label': label_id }
And I got the annotation like this:
"---QUuC4vJs": { "annotations": { "label": "testifying", "segment": [ 84.0, 94.0 ] },
so I consider the length of frame indices is 10. I'm sorry I can't got dataset so I just read your code to infer the size of data.

2)Yes, I first use github, so there are something wrong with my comment format. This is my test code:
rois = torch.rand((1, 32, 10, 4)) inputs = torch.rand((1, 3, 32, 224, 224)) strg = STRG(model) out = strg(inputs, rois)
When I use this, I got the feauture with 2 * 512 * 7 * 7, but my expectation was getting a output with size of 32 * 512 * 7 * 7 beacuse the depth of input is 32(I sent 32 frames into network). Could you tell me what's wrong with me.

Tnank you very much if you could help me.

jd730 · 2020-08-20T11:53:03Z

@IVparagigm

If I remember correctly, that one is from https://github.com/kenshohara/3D-ResNets-PyTorch/blob/master/datasets/videodataset.py and it is for dense-sampling. it finally samples 32 frames on the next stages.
what is model in your code? Is it resnet_strg correctly?

IVparagigm · 2020-08-20T12:10:24Z

1)Yes, what I metioned is in this file.
frame_indices = self.data[index]['frame_indices'] if self.temporal_transform is not None: frame_indices = self.temporal_transform(frame_indices) clip = self.__loading(path, frame_indices)
The self.temporal_transform make frame_indices into the length of 32, right?

2)resnet_strg and resnet both I have tried.
In resnet_strg, after self.reducer I got the output with size of 16 * 512 * 14 * 14. If output with size of 32 * 512 * 7 * 7 is what I want. Is it achievable?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About your dataset #2

About your dataset #2

IVparagigm commented Aug 20, 2020

jd730 commented Aug 20, 2020 •

edited

IVparagigm commented Aug 20, 2020

jd730 commented Aug 20, 2020

IVparagigm commented Aug 20, 2020

About your dataset #2

About your dataset #2

Comments

IVparagigm commented Aug 20, 2020

jd730 commented Aug 20, 2020 • edited

IVparagigm commented Aug 20, 2020

jd730 commented Aug 20, 2020

IVparagigm commented Aug 20, 2020

jd730 commented Aug 20, 2020 •

edited