read_video error for slightly large videos when extracting S3D features. #90

divineSix · 2023-01-06T19:54:31Z

I was trying to extract S3D features on a video (~51MB, 11 mins), and was getting an error at the very start of the extraction process, with a console message Killed.

This is occurring because in extract-S3D.py, we're using the read_video from torchvision.io.video to process the video file. I tried to execute only this statement separately and faced the same issue. However, I was able to process a smaller video file (<1MB, ~5 secs) and feature extraction then proceeded without a hitch. Same for the samples provided in the repo. This issue is not present in the I3D feature extraction. Probably because there you use the VideoCapture methods from OpenCV?

I'm trying to see if some other video reader works for this, but I am unsure if read_video applies any transforms before outputting the RGB torch array mentioned in the code. Can you suggest any workaround if this doesn't work?

The torchvision version in my environment is 0.12.0, omegaconf is 2.1.1 as described.

EDIT: I've tried extracting the features for the video I had issue with on the S3D colab notebook, but the kernel crashes there as well.

The text was updated successfully, but these errors were encountered:

divineSix · 2023-01-06T21:22:37Z

I've tried using read_video on a brand-new environment with the latest torchvision and av modules installed and I'm facing the same issue. There seems to be an open issue in the torchvision repo regarding this as well, although I'm not sure on the details.

My video has ~13k frames, and I'm wondering if the problem is that the code loads all 13k frames into the cpu/gpu at once. I'm new to this field entirely, so please do let me know if I'm missing something.

v-iashin · 2023-01-07T06:51:44Z

I am quite sure that the issue is related to the lack of RAM. You may confirm it by tracking the RAM level as you run the script with your video.

The reason why it works with opencv is the way it loads the video. In particular, in contrast to torchvision which tries to read the whole video into RAM, opencv reads frames one by one and features are extracted from chunks of the frames and those are discarded after that.

My suggestion is to split your long video into small pieces with ffmpeg.

I do admit that such a difference between readers is confusing and limits applications. However, it ensures that the feature extraction process matches the one that was used during training.

mrkstt · 2023-09-11T14:44:39Z

I was trying to extract S3D features on a video (~51MB, 11 mins), and was getting an error at the very start of the extraction process, with a console message Killed.

This is occurring because in extract-S3D.py, we're using the read_video from torchvision.io.video to process the video file. I tried to execute only this statement separately and faced the same issue. However, I was able to process a smaller video file (<1MB, ~5 secs) and feature extraction then proceeded without a hitch. Same for the samples provided in the repo. This issue is not present in the I3D feature extraction. Probably because there you use the VideoCapture methods from OpenCV?

I'm trying to see if some other video reader works for this, but I am unsure if read_video applies any transforms before outputting the RGB torch array mentioned in the code. Can you suggest any workaround if this doesn't work?

The torchvision version in my environment is 0.12.0, omegaconf is 2.1.1 as described.

EDIT: I've tried extracting the features for the video I had issue with on the S3D colab notebook, but the kernel crashes there as well.

Can anyone share a success story to run this, especially the hardware configuration (RAM and GPU memory size probably)?.
I also faced the same problem, with GTX 1050Ti. @v-iashin @divineSix @borijang

GunjanDhanuka · 2024-02-15T13:25:06Z

You can use the OpenCV video reader instead of the torchvision video reader, seemed to fix the issue in my case.

        # rgb_vid, audio, info = read_video(video_path, pts_unit='sec')

        print("Video reading started")
        cap = cv2.VideoCapture(video_path)
        fps = cap.get(cv2.CAP_PROP_FPS)
        rgb_stack = []

        while cap.isOpened():
            frame_exists, rgb = cap.read()

            if frame_exists:
                # preprocess the image
                rgb = cv2.cvtColor(rgb, cv2.COLOR_BGR2RGB)
                rgb_stack.append(rgb)
            else:
                # we don't run inference if the stack is not full (applicable for i3d)
                cap.release()
                break

        rgb1 = torch.tensor(np.array(rgb_stack))

GunjanDhanuka · 2024-02-15T13:38:28Z

I am using it to extract features from the XD-Violence dataset, and I compared the numpy arrays (using np.array_equal) after getting the features from both cv2 and read_video and the result was True.

v-iashin · 2024-02-15T13:44:20Z

I compared the numpy arrays (using np.array_equal) after getting the features from both cv2 and read_video and the result was True.

Ok, that's great to know.

However, I think the suggested code won't work if you have 1000s of frames.

The code above needs to be updated to handle chunks of frames and their release after features were extracted for that chunk to free up memory.

It should be similar to how it is done for I3D:

video_features/models/i3d/extract_i3d.py

Lines 116 to 122 in 896b852

    
           batch_feats_dict = self.run_on_a_stack(rgb_stack, stack_counter, padder) 
        
           for stream in self.streams: 
        
               feats_dict[stream].extend(batch_feats_dict[stream].tolist()) 
        
           # leaving the elements if step_size < stack_size so they will not be loaded again 
        
           # if step_size == stack_size one element is left because the flow between the last element 
        
           # in the prev list and the first element in the current list 
        
           rgb_stack = rgb_stack[self.step_size:]

If read_video and cv2 output comparable features, one could use cv2 frame-by-frame reading as it is done for i3d.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_video error for slightly large videos when extracting S3D features. #90

read_video error for slightly large videos when extracting S3D features. #90

divineSix commented Jan 6, 2023 •

edited

divineSix commented Jan 6, 2023 •

edited

v-iashin commented Jan 7, 2023

mrkstt commented Sep 11, 2023

GunjanDhanuka commented Feb 15, 2024

GunjanDhanuka commented Feb 15, 2024

v-iashin commented Feb 15, 2024 •

edited

read_video error for slightly large videos when extracting S3D features. #90

read_video error for slightly large videos when extracting S3D features. #90

Comments

divineSix commented Jan 6, 2023 • edited

divineSix commented Jan 6, 2023 • edited

v-iashin commented Jan 7, 2023

mrkstt commented Sep 11, 2023

GunjanDhanuka commented Feb 15, 2024

GunjanDhanuka commented Feb 15, 2024

v-iashin commented Feb 15, 2024 • edited

divineSix commented Jan 6, 2023 •

edited

divineSix commented Jan 6, 2023 •

edited

v-iashin commented Feb 15, 2024 •

edited