Skip to content

johnbumgarner/facial_similarities

Repository files navigation

Overview Image Similarities

Most humans can look at two photos and quickly determine if the images are either similarity or dissimilarity in nature. Computers can be programmed to perform a similar task, but the results can vary, because of multiple factors(e.g., lighting conditions, perspectives) that humans can instinctively do automatically. Humans have little difficulty seeing the differences between photos of two individuals with similar characteristics, but a computer will have some issues.

There are numerous use cases for image similarities technologies. These use cases range from duplicate image detection to domain specific image clustering. Identifying duplicate images in Apple Photo is a common use case for many of us dealing with a large digital image library. Some of us have likey used Google’s Reverse Image Search to look for a specific photo that we want to know more about. Google will scour its massive database for images similar to the one used in your query.

Primary objective of this repository

This repository is going to examine several of the methods used to ascertain if two or more images have similarity or dissimilarity. The set of images used in these image simarility tests are publicly available photographs of well-known female actresses. The dataset has 12 images of these actresses wearing earrings. Their multiple photos of Jennifer Aniston within this dataset.

Another objective of this repository is to determine the capabilities and limitations of the Python libraries used to perform these image simarility tests.

Image Simarility Experiments

Python Imaging Library:

This experiment used the Python module Pillow, which is a folk of PIL, the Python Imaging Library. The Pillow function used in this experiment was PIL.ImageChops. The ImageChops module contains a number of arithmetical image operations, called channel operations (“chops”). These can be used for various purposes, including special effects, image compositions, algorithmic painting, and more. The sub-function used was difference, which returns the absolute value of the pixel-by-pixel difference between two images. Here is how the function is called.

PIL.ImageChops.difference(base_image, comparison_image)

The base_image in this experiment was one of Jennifer Aniston. The comparison_image dataset used in this experiment was the one that contained 12 images of actresses wearing earrings. The are 3 images of Jennifer Aniston in this dataset, but only one of these images is an absolute match.

Base Image Name Comparison Image Name pixel similarity
jennifer_aniston.jpeg jennifer_aniston_earrings.jpeg identical
jennifer_aniston.jpeg natalie_portman_earrings.jpeg dissimilar
jennifer_aniston.jpeg jennifer_aniston_earrings_03.jpeg dissimilar
jennifer_aniston.jpeg poppy_delevingne_earrings.jpeg dissimilar
jennifer_aniston.jpeg taylor_swift_earrings.jpeg dissimilar
jennifer_aniston.jpeg hilary_swank_earrings.jpeg dissimilar
jennifer_aniston.jpeg jennifer_aniston_earrings_02.jpeg dissimilar
jennifer_aniston.jpeg maggie_gyllenhaal_earrings.jpg dissimilar
jennifer_aniston.jpeg nicole_kidman_earrings.jpeg dissimilar
jennifer_aniston.jpeg julia_roberts_earrings.jpeg dissimilar
jennifer_aniston.jpeg jennifer_garner_earrings.jpeg dissimilar
jennifer_aniston.jpeg elizabeth hurley_earrings.jpeg dissimilar

This pixel-by-pixel comparison is useful in finding exact duplicates, but it will not match images that have been slightly altered thus resulting in a different pixel value.

ImageHash Library:

This experiment used the Python module ImageHash,, which was developed by Johannes Bucher. This module has four hashing methods:

  1. aHash: average hash, for each of the pixels output 1 if the pixel is bigger or equal to the average and 0 otherwise.

  2. pHash: perceptive hash, does the same as aHash, but first it does a Discrete Cosine Transformation (DCT).

  3. dHash: difference hash, calculate the difference for each of the pixel and compares the difference with the average differences.

  4. wavelet: wavelet hashing, works in the frequency domain as pHash but it uses Discrete Wavelet Transformation (DWT) instead of DCT.

aHash algorithm

The average hash algorithm is designed to scale the input image down to 8×8 pixels and covert this smaller image to grayscale. This changes the hash from 64 pixels (64 red, 64 green, and 64 blue) to 64 total color. Next the algorithm averages these 64 colors and computes a mean value. Each bit of the rescaled image is evaluated against this mean value and each bit is set either above of below this value. These measurements are used to construct a hash, which will not change even if the image is scaled or the aspect ratio changes. The hash value will not dramatically change even when someone increasing or decreasing the brightness or contrast of the image or alters its colors.

An image hash is used to determine the hamming distance between two hashes. The Hamming distance between two strings (hashes) of equal length is the number of positions at which these strings vary. In more technical terms, it is a measure of the minimum number of changes required to turn one string into another.

A Hamming distance of 0 means that two images are identical, whereas a distance of 5 or less indicates that two images are probably similar. If the Hamming distance is greater than 10 then the images are most likely different.

The basic usage of the average hash algorithm within ImageHash is:

hash0 = imagehash.ahash(Image.open(base_image))
hash1 = imagehash.ahash(Image.open(comparison_image))
computational_score = (hash0 - hash1)

The average hash algorithm correctly matched the Jennifer Aniston base image to the same Jennifer Aniston comparison image within the dataset. The algorithm did not find any similarities between the Jennifer Aniston base image and the other Jennifer Aniston comparison images within the dataset.

aHash results

Base Image Name Comparison Image Name Similarity Score
jennifer_aniston.jpeg jennifer_aniston_earrings.jpeg 0
jennifer_aniston.jpeg natalie_portman_earrings.jpeg 24
jennifer_aniston.jpeg jennifer_aniston_earrings_03.jpeg 25
jennifer_aniston.jpeg poppy_delevingne_earrings.jpeg 27
jennifer_aniston.jpeg taylor_swift_earrings.jpeg 27
jennifer_aniston.jpeg hilary_swank_earrings.jpeg 28
jennifer_aniston.jpeg jennifer_aniston_earrings_02.jpeg 31
jennifer_aniston.jpeg maggie_gyllenhaal_earrings.jpg 31
jennifer_aniston.jpeg nicole_kidman_earrings.jpeg 34
jennifer_aniston.jpeg julia_roberts_earrings.jpeg 34
jennifer_aniston.jpeg jennifer_garner_earrings.jpeg 35
jennifer_aniston.jpeg elizabeth hurley_earrings.jpeg 41

The average hash algorithm was able to correctly classify 3 of the 6 variations of the Jennifer Aniston comparison image within the modified dataset to the base image of Jennifer Aniston. All the Hamming distance values for these modified images were in a range between 2 and 5, which are within the threshold range for potentially similar images. The average hash algorithm was not able to identify a mirror image of the base image within the modified dataset.

pHash algorithm

The core difference between the average hash algorithm and the perceptive hash algorithm is how the latter handles either gamma correction or color histogram modifications applied to an image. The average hash algorithm will generate false-misses when slight color variations have been applied to the a comparison image. The perceptive hash algorithm handles these variations by using discrete cosine transform (DCT), which expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies.

The perceptive hash algorithm is designed to scale the input image down to 32×32 pixels and covert this smaller image to grayscale. Next the algorithm uses DCT to separates the image into a collection of frequencies and scalars. After this is done the algorithm extracts the top-left 8x8, which represent the lowest frequencies in the image. The 64 bits of this 8x8 will be set to a binary value of 0 or 1 depending on whether the value is above or below the average value. The resulting hash value will not dramatically change even if a comparison image has had gamma or color histogram adjustments.

A Hamming distance value of 0 means that two images are identical, whereas a distance of 10 or less indicates that two images are potentially similar and a value greater than 10 suggests that the images are most likely different.

The basic usage of the perceptive hash algorithm within ImageHash is:

hash0 = imagehash.phash(Image.open(base_image))
hash1 = imagehash.phash(Image.open(comparison_image))
computational_score = (hash0 - hash1)

The perceptive hash algorithm correctly matched the Jennifer Aniston base image to the same Jennifer Aniston comparison image within the dataset. The algorithm did not find any similarities between the Jennifer Aniston base image and the other 2 Jennifer Aniston comparison images within the dataset.

pHash results

Base Image Name Comparison Image Name Similarity Score
jennifer_aniston.jpeg jennifer_aniston_earrings.jpeg 0
jennifer_aniston.jpeg elizabeth hurley_earrings.jpeg 24
jennifer_aniston.jpeg poppy_delevingne_earrings.jpeg 26
jennifer_aniston.jpeg hilary_swank_earrings.jpeg 26
jennifer_aniston.jpeg natalie_portman_earrings.jpeg 26
jennifer_aniston.jpeg jennifer_aniston_earrings_02.jpeg 28
jennifer_aniston.jpeg maggie_gyllenhaal_earrings.jpg 28
jennifer_aniston.jpeg julia_roberts_earrings.jpeg 28
jennifer_aniston.jpeg jennifer_aniston_earrings_03.jpeg 30
jennifer_aniston.jpeg jennifer_garner_earrings.jpeg 30
jennifer_aniston.jpeg taylor_swift_earrings.jpeg 34
jennifer_aniston.jpeg nicole_kidman_earrings.jpeg 38

The discrete cosine transform approached was able to correctly classify the 5 of the 6 variations of the Jennifer Aniston comparison image within the modified dataset to the base image of Jennifer Aniston. All the Hamming distance values for these modified images were in a range between 2 and 8, which are within the threshold range for potentially similar images. The perceptive hash algorithm was not able to identify a mirror image of the base image within the modified dataset.

dHash algorithm

The difference hash algorithm is nearly identical to the average hash algorithm. dhash is designed to tracks gradients, while aHash focuses on average values and pHash evaluates frequency patterns. dHash scales the input image down to an odd aspect ratio of 9x8. This aspect ratio has 72 pixels, which is slightly more then aHash's 64 pixels. dHash coverts this smaller image to grayscale, which changes the hash from 72 pixels (72 red, 72 green, and 72 blue) to 72 total colors.

After this conversion the dHash algorithm will measure the differences between adjacent pixels, thus identifying the relative gradient direction of each pixel on a row to row basis. After all this computation the 8 rows of 8 differences becomes 64 bits. Each of these bits is simply set based on whether the left pixel is brighter than the right pixel. These measurements will match any similar image regardless of its aspect ratio prior to dHash shrinking the image.

A Hamming distance value of 0 means that two images are identical, whereas a distance of 10 or less indicates that two images are potentially similar and a value greater than 10 suggests that the images are most likely different.

The basic usage of the difference hash algorithm within ImageHash is:

hash0 = imagehash.dhash(Image.open(base_image))
hash1 = imagehash.dhash(Image.open(comparison_image))
computational_score = (hash0 - hash1)

The difference hash algorithm correctly matched the Jennifer Aniston base image to the same Jennifer Aniston comparison image within the dataset. The algorithm did not find any similarities between the Jennifer Aniston base image and the other 2 Jennifer Aniston comparison images within the dataset.

dHash results

Base Image Name Comparison Image Name Similarity Score
jennifer_aniston.jpeg jennifer_aniston_earrings.jpeg 0
jennifer_aniston.jpeg hilary_swank_earrings.jpeg 18
jennifer_aniston.jpeg poppy_delevingne_earrings.jpeg 21
jennifer_aniston.jpeg jennifer_aniston_earrings_02.jpeg 22
jennifer_aniston.jpeg taylor_swift_earrings.jpeg 25
jennifer_aniston.jpeg julia_roberts_earrings.jpeg 26
jennifer_aniston.jpeg natalie_portman_earrings.jpeg 27
jennifer_aniston.jpeg elizabeth hurley_earrings.jpeg 28
jennifer_aniston.jpeg maggie_gyllenhaal_earrings.jpg 28
jennifer_aniston.jpeg nicole_kidman_earrings.jpeg 32
jennifer_aniston.jpeg jennifer_garner_earrings.jpeg 32
jennifer_aniston.jpeg jennifer_aniston_earrings_03.jpeg 35

The difference hash algorithm was able to correctly classify 4 of the 6 variations of the Jennifer Aniston comparison image within the modified dataset to the base image of Jennifer Aniston. The image with the red border had a similarity score of O, which is considered an identical match. The other 3 images similarity scores in a range between 1 and 5, which are within the threshold range for potentially similar images. The difference hash algorithm was not able to identify a mirror image of the base image within the modified dataset.

wavelet algorithm

The wavelet hash algorithm is similar to the perceptive hash algorithm, because it operates within the frequency domain. The main difference is that the wavelet hash algorithm uses discrete wavelet transform(DWT), instead of discrete cosine transform like the perceptive hash algorithm does. In numerical analysis and functional analysis, a discrete wavelet transform is any wavelet transform for which the wavelets are discretely sampled. The wavelet hash algorithm used the Haar wavelet, which is a sequence of rescaled "square-shaped" functions which together form a wavelet family.

The basic usage of the wavelet hash algorithm within ImageHash is:

hash0 = imagehash.whash(Image.open(base_image))
hash1 = imagehash.whash(Image.open(comparison_image))
computational_score = (hash0 - hash1)

The wavelet hash algorithm has a mode parameter, which allows the wavelet family to be changed.

imagehash.whash(Image.open(image, hash_size = 8, image_scale = None, mode = 'haar', remove_max_haar_ll = True))

These wavelet families can be changed by installing the Python module pywavelets.

import pywt
pywt.families()
['haar', 'db', 'sym', 'coif', 'bior', 'rbio', 'dmey', 'gaus', 'mexh', 'morl', 'cgau', 'shan', 'fbsp', 'cmor']

# db3 is a family member of db
w = pywt.Wavelet('db3')

hash0 = imagehash.whash(Image.open(base_image) mode=w)

The wavelet hash algorithm correctly matched the Jennifer Aniston base image to the same Jennifer Aniston comparison image within the dataset. If the computational score threshold was set to less than 15, then the other Jennifer Aniston's images within the dataset were not considered similar images.

wavelet results

Base Image Name Comparison Image Name Similarity Score
jennifer_aniston.jpeg jennifer_aniston_earrings.jpeg 0
jennifer_aniston.jpeg jennifer_aniston_earrings_03.jpeg 24
jennifer_aniston.jpeg poppy_delevingne_earrings.jpeg 26
jennifer_aniston.jpeg taylor_swift_earrings.jpeg 26
jennifer_aniston.jpeg natalie_portman_earrings.jpeg 26
jennifer_aniston.jpeg hilary_swank_earrings.jpeg 28
jennifer_aniston.jpeg jennifer_aniston_earrings_02.jpeg 30
jennifer_aniston.jpeg nicole_kidman_earrings.jpeg 34
jennifer_aniston.jpeg julia_roberts_earrings.jpeg 34
jennifer_aniston.jpeg jennifer_garner_earrings.jpeg 36
jennifer_aniston.jpeg maggie_gyllenhaal_earrings.jpg 38
jennifer_aniston.jpeg elizabeth hurley_earrings.jpeg 40

The discrete wavelet transform approached was able to correctly classify the 6 variations of the Jennifer Aniston comparison image within the modified dataset to the base image of Jennifer Aniston. All the computational values for these modified images were in a range between 2 and 12, which were all within the threshold range for potentially similar images. The wavelet hash algorithm was also able to identify a mirror image of the base image, but the computational score was 16, which was slightly outside the threshold of 15 or less.

Numpy and Math Library:

Structural Similarity Index (SSIM)

Image similarity can also be accomplished using the Python modules Numpy and Math. These modules can be used to determine the Structural Similarity Index (SSIM), which is a perceptual metric for measuring the similarity between two images. These measurements are most useful when determine the graphically similarity between two identical images that have subtle differences to the human eye. The latter comparison will be examined in another repository. The current SSIM examinations will only focus on the similarity or dissimilar between two images, which is like the previous tests using Pillow or ImageHash.

Structural Similarity Index results

Base Image Name Comparison Image Name Similarity Score
jennifer_aniston.jpeg jennifer_aniston_earrings.jpeg 100.00
jennifer_aniston.jpeg jennifer_aniston_earrings_02.jpeg 30.14
jennifer_aniston.jpeg taylor_swift_earrings.jpeg 25.84
jennifer_aniston.jpeg julia_roberts_earrings.jpeg 22.23
jennifer_aniston.jpeg poppy_delevingne_earrings.jpeg 21.22
jennifer_aniston.jpeg maggie_gyllenhaal_earrings.jpg 20.58
jennifer_aniston.jpeg elizabeth hurley_earrings.jpeg 19.69
jennifer_aniston.jpeg jennifer_garner_earrings.jpeg 19.25
jennifer_aniston.jpeg hilary_swank_earrings.jpeg 17.16
jennifer_aniston.jpeg natalie_portman_earrings.jpeg 13.78
jennifer_aniston.jpeg jennifer_aniston_earrings_03.jpeg 13.50
jennifer_aniston.jpeg nicole_kidman_earrings.jpeg 13.27

Hamming Distance

The Python modules Numpy and Math can also be used to determine the hamming distance between two images. Hamming distance was discussed when using ImageHash.

Hamming distance results

Base Image Name Comparison Image Name Similarity Score
jennifer_aniston.jpeg jennifer_aniston_earrings.jpeg 100.00
jennifer_aniston.jpeg poppy_delevingne_earrings.jpeg 62.54
jennifer_aniston.jpeg jennifer_aniston_earrings_02.jpeg 61.81
jennifer_aniston.jpeg taylor_swift_earrings.jpeg 59.37
jennifer_aniston.jpeg julia_roberts_earrings.jpeg 59.22
jennifer_aniston.jpeg jennifer_garner_earrings.jpeg 58.89
jennifer_aniston.jpeg elizabeth hurley_earrings.jpeg 58.53
jennifer_aniston.jpeg maggie_gyllenhaal_earrings.jpg 57.21
jennifer_aniston.jpeg jennifer_aniston_earrings_03.jpeg 57.12
jennifer_aniston.jpeg hilary_swank_earrings.jpeg 56.42
jennifer_aniston.jpeg natalie_portman_earrings.jpeg 55.81
jennifer_aniston.jpeg nicole_kidman_earrings.jpeg 55.75

Both methods were ables to successfully identify the target image of Jennifer Aniston from the control set of images. As in the previous experiments the SSIM and Hamming measurement generated a computational score based on similarity and distance. Establishing a computational score threshold using these two measurement methods was highly problematic, because it produced a considerable amount of false positives.

Conclusions:

Each of the methods used in this repository were able to correctly identify the target image of Jennifer Aniston from the control set of images. Each method had various strengths and weaknesses in generating computational score based on similarity and distance, especially for those images that were modified (e.g. increase brightness). Depending on the use case each method has its own particular merits, but ImageHash seemed to be the most robust and accurate of all the modules evaluated.

Notes:

The code within this repository is not production ready. It was strictly designed for experimental testing purposes only.