Given a set of new images and a library of existing images, find and retrieve most similar images.
The overall goal of this project is train a set of models to determine when provided a new image (in this case a frame from one of the MCU movies), we seek to provide an output of the closest similar images related to the queries image/scene. In other words, given an image and a library of images, retireve the closes K similar images to queried image. This objective is achieved by:
- leverage pre-trained models to extract features of images and index the nearest neighbor
- Retrieve similar images by calculating similarity distances between inquired image and images in the databse.
This project was developed using [anaconda] package environment. Some SDKs are only avialbale on PyPI so we install them using pip as part of the conda software install as well.
Refer here for additional steps on setting up anaconda python environment
-
Clone the repository
git clone https://github.com/naivelogic/Image_Similarity_in_Marvel_MCU.git
-
Create a virtual environment via conda called
poe
, using./environment.yml
. This can be achieved by running the command:conda create --name marvel --file environment.yml
-
Run notebooks ssim_modeling.ipynb
- TODO: update ENVIROMENT file to run fastai notebooks.
For all Marvel Cinematic Universe (MCU) fans, and more boadly applied Machine Learning practictioners, this project seeks to utilize unsupervised learning and pretrained models to retrieve a list of similar images based on the inquired image. In short, we are building a unspervided deep learning Image Similarity Recommendation system. Moreover, described in the notebooks, the project can be used to tuned whatever image repository you have available.
To summerize the processes involved in this image similarity and retrieval system, at the highest level we are using a pre-trained deep learning model to to extract features from a provided image library (in this case Marvel Cinematic Movies) into a list of numbers (array vector) describing each image. Then we will experiment with various distance functions to best calculate the similarities between a queried image and all the other feature vectors from the image library to determine the images that are most similar.
Below are the key steps used in this project:
- Collect data and establish Image library (in thes project we scrapped various YouTube videos)
- Normalized, resize and preprocess images
- Index image library and append meta data
- Compute image similarity score (similarity measures and algorithms described in the below section)
- Extract image features with pretrained model (e.g., CNN model like VGG50)
- Save compressed feature matrix from compiled model
- Display predicted similary images from image library based off of a new image
For the Image Processing, i utilized Marvel's trailers posted on YouTube. Refer to the image_processing file in the image folder on downloading and capturing images.Such as:
- Experiment 1:
- [YouTube] MCU Complete Recap
- [YouTube] The Entire MCU Timeline Explained
- [Youtbe] Marvel Cinematic Universe 10 Year Recap
- Experiment 2:
-
Went to all [YouTube] Marevel Entertainment channel and for each movie scraped images scenes as jpg files in each movie tile specific folder. Such as:
- ./images/ - antman1 - antman2 - avengers1 - avengers2
-
In order to perform the similar inmage search, below are some of the algorithms used to determine which K images in the database is similar to queried image.
- Mean Squared Error - calculates the average squared differences (viz. errors) between images. The closer MSE is to 0, the more similar.
- Structural Similarity (SSIM) Index
- Locality Sensitive Hashing (LSH) - creates image feature hash table that computes the similarity probability and returns a relevance rank of images indexed from the image library
- various methods of CNN distance feature extraction, such as:Euclidean, Cosine, CityBlock, Manhattan and L2 regularization
While experimenting, we experimented with various pre-trained deep learning architecture like ImageNet and VGG to generate features from images and similarity metrics.
Below is an example output of the image similarity retrieval system:
Just scrapping various trailer images of the MCU movies, we randomly selected a MCU movie image on google, in this case Guardians of the Galaxy (2014) and which retrieved similar images from the image library from that scene. Apparently the inquired image is not in the image library, however to solve that, we can just use the [x] [Duplicate Hash] notebook that uses the Locality Sensitive Hashing (LSH) function disussed above for identifying duplicative images.
Model | Input Size | Loss | Accuracy | top 5 accuracy | Date |
---|---|---|---|---|---|
ResNet34 | 224x224 | 1.011 | 0.723 | 0.935 | 10/17/19 |
ResNet50 | 224x224 | 1.018x | 0.737 | 0.936 | 10/22/19 |
ResNet18 | 224x224 | x.xxx | x.xxx | x.xxx | |
Squeeznet |
Distance Measure | mAP@5 | mAP@10 | mAP@20 | notes |
---|---|---|---|---|
Euclidean | ||||
Cosine | ||||
Manhattan |
- Those new to python libraries, follow this guide to install Keras + Tensorflow as both are needed if training a new model.
If starting fresh:
-
Download Anaconda and Python 3 - refer here for setup assistance
-
conda install -c conda-forge tensorflow
-
pip inesll keras
-
clone repository
-
For the Image Processing, i utilized Marvel's trailers posted on YouTube. Refer to the image_processing file in the image folder on downloading and capturing images. To do this you'll need to download
pytube
by running the following command in the terminal. pytube documentation
$ pip install pytube
- TODO: Create and test environment.yml file (for tf and fast.ai)
- TODO: Model and Similarity Metric Evaluations
- min-hash sigature maxtrix + threshold; Code to reference- santhoshhari github
- details on Detecting near-duplicates
- Incorporate scores, threshold and filenames in final output for each k retrieved image
- Formalize experimental notebooks and scripts
- Feature: Web crawl using Azure: Bing Image Search to find similar images.
- [1] Math Works - SSIM
- [2] Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4), 600–612.
- [3] Locality Sensitive Hashing - Application of Locality Sensitive Hashing to Audio Fingerprinting