Skip to content

anastasia/minhash

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MinHash

MinHash explanation: http://infolab.stanford.edu/~ullman/mmds/book.pdf (chapter 3, also archived here: https://perma.cc/K9B4-QTX3) A simple take here: https://moz.com/devblog/near-duplicate-detection/

This implementation borrows from Chris McCormick's MinHash tutorial. https://github.com/chrisjmccormick/MinHash

To install (for now):

pip install -e "git+git://github.com/anastasia/minhash.git@master#egg=minhash"

To run in CLI:

python minhash.py doc1 doc2

To run in python:

import minhash
minhash.calculate(string_a, string_b)

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages