Skip to content

nikvoronin/we-aint-same

Repository files navigation

We Ain't Same

Let's find out duplicate images with Perceptual Hashing algorithms.

  • Calculate perceptual hashes using ImageHash library.
  • Store and restore precalculated hashes.
  • Recursive seeking of image files.
  • Detect duplicates.
  • Organize images into the groups of duplicates.

Example

samples002

Output log:

+++ Computing hashes...
+ C:\Users\Pictures\Samples\Pictures\bing20221129.jpg
+ C:\Users\Pictures\Samples\Pictures\fireworks.jpg
+ C:\Users\Pictures\Samples\Pictures\mars.jpg
+ C:\Users\Pictures\Samples\Pictures\mount-copy.jpg
+ C:\Users\Pictures\Samples\Pictures\mount-rotated-2degree.jpg
+ C:\Users\Pictures\Samples\Pictures\mount-small.jpg
+ C:\Users\Pictures\Samples\Pictures\mountains.jpg

+++ Chasing duplicates...
....>>> mount-copy.jpg
        dup: 90,625% <> mount-rotated-2degree.jpg
>>> mount-copy.jpg
        dup: 100% == mount-small.jpg
>>> mount-copy.jpg
        dup: 100% == mountains.jpg
...
+++ Similarity: max= 100% / min= 40,625%

+++ Duplicate Groups (1):
Group #1
        mount-copy.jpg
        mount-rotated-2degree.jpg
        mount-small.jpg
        mountains.jpg

+++ TOTAL: 00:00:00.7362885

Precalculated hashes as JSON file:

[
    {
        "Path": "C:\\Users\\Pictures\\Samples\\Pictures\\bing20221129.jpg",
        "Hash": 11695141823225099355
    },
    {
        "Path": "C:\\Users\\Pictures\\Samples\\Pictures\\fireworks.jpg",
        "Hash": 10721035060630703339
    },
    // ...
]

Links