A neural network to detect a title card within a video file and a tool for it #362

KOLANICH · 2023-06-12T22:46:26Z

Project description

Imagine, that there is a bunch of movie files. None of them have embedded thumbnails. Your task is to make nice thumbnails for them.

Video summarization is a pretty hard task, not only for AI, but also for people. Because this problem is severely ill-posed and there are a lot of valid choices.

Some video files contain effects added using video-editing software. Such as title cards. A title card is a frame where a name/logo of the clip is shown. Often the font of the name is very stylized and large and can be recognized by its style alone even without reading text in it.

Thumbnails are reduced size images that usually serve as means of making it easier for people to recognize needed files and select the ones the people want without reading the fine print of filename and/or waiting for it to be scrolled into view.

Title cards should make nice thumbnails because of their properties. So, the following program is needed.

video stream is scanned. keyframes are extracted.
images are downscaled to the point neural network inference is fast enough.
the images are passed through a neural network-based one-shot object detector, predicting the probability of a frame being a title card. the score is thresholded.
a machine-readable list of frames and their chances to be a title cards is formed.
semantic segmentation and boundary detection is run on the full-scale images
the frames are cropped to the rectangle enclosing the titles/logos.

Relevant Technology

C++
FFMpeg
ONNX
Neural network frameworks, such as pytorch, Tensorflow and tinygrad
Python

Complexity and required time

Complexity

Advanced - The project requires the user to have a good understanding of all components of the project to contribute

Required time (ETA)

Much work - The project will take more than a couple of weeks and serious planning is required

Categories

AI/ML
Futuristic Tech/Something Unique

The text was updated successfully, but these errors were encountered:

mihastele · 2023-07-13T11:11:48Z

@KOLANICH I could also look into this, do you have any sample videos where I could try working on?

Have a great day!
All the best from Slovenia.

KOLANICH · 2023-07-14T09:47:52Z

I have no sample videos, and no dataset. You need not the dataset of videos, but a dataset of title card frames of them. I have no such dataset and don't inow where to get it. A good heuristics is the presence of stylized text within frames tyat can probably be detected by other neural network. Anyway, annotation using GPT-4 and other near-AGI models should be helpful. If you have a videocollection, it should contain quite some videos containing title cards. Also quite some videos from YouTube should contain them.

I guess one can start from detection of title screens of presentations. Usually they are the first slide of a presentation, and presentations can be harvested from internet using their filename extension. The title screens can be augnemted by style transfer neural networks to make them stylezed and less text-like.

After a model recognizing title screens of presentations is trained, one can try to recognize title screens of real videos from youtube with it. In order to get title screens you don't need whole videos, the title screens are usually in the first few minutes of them, and for presentations are within first few seconds. There are quite some of videos containing title slide of presentations often overlayed by other objects like persons standing or webcam overlays. After annotation using text+pic AGI models with prompts like "does this slide look like a title" this dataset can be used to train the next generation model.

After that, the new model is applied to videos (everyone knows where to get them) containing very stylized title cards, and again the results are verified using AGI. Certain kinds of videos have title cards on exactly the same timings, it is very widespread, so it may make sense to add the detection of this case.

I guess it is the way to get dataset. Bootstrap and ikprove evolutionary, not try to make the perfect model forom the very first dataset obtained (this would require a dataset it is infeasible to create).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A neural network to detect a title card within a video file and a tool for it #362

A neural network to detect a title card within a video file and a tool for it #362

KOLANICH commented Jun 12, 2023 •

edited

mihastele commented Jul 13, 2023

KOLANICH commented Jul 14, 2023

A neural network to detect a title card within a video file and a tool for it #362

A neural network to detect a title card within a video file and a tool for it #362

Comments

KOLANICH commented Jun 12, 2023 • edited

Project description

Relevant Technology

Complexity and required time

Complexity

Required time (ETA)

Categories

mihastele commented Jul 13, 2023

KOLANICH commented Jul 14, 2023

KOLANICH commented Jun 12, 2023 •

edited