Skip to content

yash-srivastava19/Maxwell

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Maxwell

A Perceptual Hashing program for Media(Image, Audio, Video)

Improvements on ImageHash

From one of my own repositories - I bring an upgraded version of the ImageHash - Maxwell. Maxwell uses pHash to create fingerprints of the media that is presented to it.

What's new in Maxwell?

Maxwell is twisted take on One Shot Frequency Dominant Neighborhood Search(https://www.sciencedirect.com/science/article/pii/S0262885621001505). The scheme provided in the paper is a bit modified to generate fingerprint for an image. There are two fingerprint that are constructed - one long one, and one a short one. I have used a combination of Dicrete Cosine Transform(DCT), Scale Invariant Feature Transform(SIFT), Multi-tasked Cascaded Convolutional Neural Network(MTCNN),Mersenne Twister PRNG amongst many more throughout the image processing an manipulation pipeline.

Maxwell Scheme :

Firstly, in the image the face is recognised and cropped. Then the image is resized to 120x120 pixels, by using bi-linear interpolation. The image is then converted to grayscale and smoothed using a Gaussian filter.

This image is then broken down into 4 segments of 60x60 pixels, and individually Discrete Cosine Transform is applied to each of the segments. Till now, we have followed the scheme as given in the ONF-DNS(One Shot Frequency - Dominant Neighborhood Structure) paper.

Instead of using DNS to each of the frequency block, I have used Scale Invariant Feature Transform(SIFT) map for detection of keypoint in each of the frequency blocks. The keypoints are then converted to their respective coordinates, and individual fingerprint from each of the keypoint array is made by reducing the array by applying left bit-shift operation on the coordinates.

These fingerprint are again shuffled and appended with each other. The resulting array is the combined fingerprint of the image, and can be used as a private key or fingerprint for a particular person. This is how the long fingerprint of the image is generated.

The shorter fingerprint is generated by using the long fingerprint as a seed for MT19937(Mersenne Twister Pseudo Random Number Generator). Then, from the PRN generator some finite(say - 10) numbers are generated - which are then reduced by taking XOR of each of them. The resulting number is the shorter fingerprint for the given image. This also can be used as a private key, but of shorter length.

Why did ImageHash needed improvement ?

I loved the implementation of the ImageHash - it went through various iterations of development before being released. The problem however was one my assumptions. I wanted to create the avalanche effect of cryptographic hash functions for images. Well it completely makes sense - even a small change should produce a different fingerprint. What is it that was missing ? There is nothing wrong with the implementation of ImageHash, but I think it is time to move on for a new, better approach

Going through a quick revision of the implementation made me realize the problems with it - and also I came across Perceptual Hashing which I was, atleast in a naiive way trying to implement. Maxwell does not reinvents the whell - and make use of libraries for its approach.

What is Perceptual Hashing ?

Perceptual Hashing plays an important role in many fields of computer vision such as : image authentication, image description or image copy detection. Perceptual hashing algorithms make use of DCT(Discrete Cosine Transform),DWT(Discrete Wavelet Transform) or DFT(Discrete Fourier Transform). Ideally, the hashing should be robust against certain types of attacks : noise addition, scaling, rotating, watermarking etc.

Releases

No releases published

Packages

No packages published

Languages