Skip to content

Repository for code which detects duplicate/similar/reused images in the ECCO collection data

Notifications You must be signed in to change notification settings

AlluSu/image-similarity-detection

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Digital Humanities Hackathon 2023, the Early Modern Group

This repository contains code related to the course Digital Humanities Hackathon at University of Helsinki. For more information about the course and the projects can be found here.

Instructions for running the code

The script version similarities.py

  • Make sure you have Python with pip installed
  • Clone the repository
  • Run pip install requirements.txt
  • Go to folder /code
  • Usage:
    similarities.py [-h] --inputpath INPUTPATH [--outputpath OUTPUTPATH] [--method METHOD] [--cutoff CUTOFF] [--amount AMOUNT]

Analyze images how similar they are, and write the results to a .csv file
options:
-h, --help show this help message and exit
--inputpath INPUTPATH Relative path in quotes ("") to the folder of the images, requires
--outputpath OUTPUTPATH Relative path in quotes ("") to where the results will be stored, default is the same directory
--method METHOD "GPU" or "CPU" for computing, default is "CPU"
--cutoff CUTOFF How similar images will be stored, between 0 to 1, where larger number indicates more similar images. Default is 0.9
--amount AMOUNT How many similar images will be stored, default is 5

For example:
python3 similarities.py --inputpath="../test-images/math-small" --outputpath="data/results" --method="cpu" --cutoff=0.93 --amount=5

The jupyter notebook version

  • Make sure you have Python with pip installed
  • Clone the repository
  • Run pip install requirements.txt
  • Go to folder /code
  • Run jupyter-notebook and the Jupyter environment should open automatically in your local browser

Data

Test data containing scientifical botanical illustrations/images from 18th century books can be found from here. Suggest using Git LFS for cloning it. Extract the .zip-file to location of your choice and set the paths correct for the code to work appropriately.

Image Similarity Detection techniques

You can read more about the research and techniques related to detecting similar images from this report

About

Repository for code which detects duplicate/similar/reused images in the ECCO collection data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.7%
  • Python 0.3%