Skip to content

Latest commit

 

History

History
5 lines (4 loc) · 308 Bytes

README.md

File metadata and controls

5 lines (4 loc) · 308 Bytes

MinHashLSH

Java implementation for MinHash and LSH for finding near duplicate documents as measured by Jaccard similarity.

Implementation of MinHash for approximating Jaccard similarity in text documents.
Also includes an implementation of LSH which is a fast way to find approximate nearest neighbors.