Multimedia Question Answering

Increasing trend in the research community for video processing using artificial intelligence. Trending Tasks:

Video classification.
Video content description.
Video question answering (VQA).

Main Idea

The main idea of the project is that searching for partition of video which is most relevent to a corresponding query "Question".
Instead of watching the complete video to find the interval you want to watch, you will give our model the video and the query which describes the part you want, then our model will give you the intervals sorted by relevance to the given query.

Examples

Dataset

We use the Microsoft Research Video to Text (MSR-VTT) dataset.
Example of the dataset is shown below.

Extracted Visual Feature

We extracted the visual features of the data set using 3 different models.

ResNet-152 (like paper): gdrive link
NASNet: gdrive link
Inception-ResNet-v2: gdrive link

Architecture

Here is the base architecture which is used in paper here.

Checkpoints

We have trained the model using different visual features extractors and changed a bit in the model architecture.

Using ResNet visual features extractor (like paper): gdrive link
Using NASNet visual features extractor: gdrive
Using Inception-ResNet-v2 visual features extractor: gdrive link
Using Squeeze and Excitation technique with Inception-ResNet-v2: gdrive line
Using Dropout technique: gdrive link
Using Squeeze and Excitation along with Dropout: gdrive link
Using Squeeze and Excitation technique and increasing hidden dimension of the LSTMs: gdrive link

Results

From the results obtained in the explained experiments, we found out that the best results obtained are from using Inception-ResNet-v2 as feature extractor for the visual features.
Our model outperforms the original paper model in all used metrics as shown in the following table:

These results obtained from testing on the test set which contains 2990 videos.

You can see the comparison between all models in the following figure:

Authors

Contribute

Contributions are always welcome!

Please read the contribution guidelines first.

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Images		Images
InceptionResNet v2 Model		InceptionResNet v2 Model
NASNet Model		NASNet Model
ResNet152 Model		ResNet152 Model
Squeeze and Excitation Model		Squeeze and Excitation Model
Squeeze and Excitation and Dropout		Squeeze and Excitation and Dropout
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
contributing.md		contributing.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Images

Images

InceptionResNet v2 Model

InceptionResNet v2 Model

NASNet Model

NASNet Model

ResNet152 Model

ResNet152 Model

Squeeze and Excitation Model

Squeeze and Excitation Model

Squeeze and Excitation and Dropout

Squeeze and Excitation and Dropout

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

contributing.md

contributing.md

Repository files navigation

Multimedia Question Answering

Main Idea

Examples

Dataset

Extracted Visual Feature

Architecture

Checkpoints

Results

Authors

Contribute

License

About

Releases

Packages

Contributors 2

Languages

License

AmrHendy/multimedia_question_answering

Folders and files

Latest commit

History

Repository files navigation

Multimedia Question Answering

Main Idea

Examples

Dataset

Extracted Visual Feature

Architecture

Checkpoints

Results

Authors

Contribute

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages