Skip to content
This repository has been archived by the owner on Jun 20, 2019. It is now read-only.

kasperisager/hemingway

Repository files navigation

Hemingway

Implementations of the classic and covering locality-sensitive hashing schemes for Hamming space

Build Status

Hemingway is a library for performing nearest-neighbour searches in datasets consisting of high-dimensional bit vectors using locality-sensitive hashing. Two flavors of LSH are currently supported: Classic LSH, originally described by Piotr Indyk and Rajeev Motwani, and Covering LSH, a newer construction by Rasmus Pagh. You can read our paper on the library here.

As for the name? Say the words "locality-sensitive hashing" and "Hamming space" fast enough a bunch of times and you'll eventually arrive at "Hemingway". If not, you probably didn't pick appropriate parameters.

Contents

Installation

Hemingway can be built using any compiler that supports C++11, but ships with a CMake setup for ease of use. To get started, make sure that CMake is installed and then do:

cmake . && make

This will compile the implementation files to src/libhemingway.a. Grab this along with the header files and you'll be all set! You can check out the next section for a guide on how to actually use the library.

Usage

The design of Hemingway is fairly simple in that it exposes only two classes for all your LSH needs: lsh::vector and lsh::table. The lsh::vector class is used for representing bit vectors of arbitrary dimensionality and is constructed by supplying a std::vector<bool> containing the components of the vector:

lsh::vector v({1, 0, 0, 0, 1, 1, 0, 1});

The lsh::table class is used for representing a lookup table containing partitions of vector buckets. Two LSH schemes are currently supported in lookup tables:

Classic: In this scheme, vectors are hashed into buckets using random bit masks associated with each partition. When constructing this table, 3 parameters are specified: The dimensionality of input vectors, the number of bits to sample from vectors, and the number of partitions to use:

lsh::table t({.dimensions = 8, .samples = 3, .partitions = 4);

Covering: In this scheme, vectors are hashed into buckets using carefully constructed bit masks that ensure that the hashes of vectors within a given radius from each other will collide. Only two parameters are specified when constructing this table: The dimensionality of input vectors, and the radius that should be covered:

lsh::table t({.dimensions = 8, .radius = 2);

Once you've constructed your table, go ahead and add your vectors:

t.insert(v);

Once you've added all your vectors you can perform queries against the lookup table:

lsh::vector r = t.query(lsh::vector({1, 0, 1, 0, 1, 1, 0, 0}));

Authors

This library came about as a result of the Advanced Algorithms seminar held at the IT University of Copenhagen. We would like to give thanks to our supervisors for not only their help but also their immense patience during the seminar.

Kasper Kronborg Isager Radosław Niemczyk

License

Copyright © 2016 Kasper Kronborg Isager and Radosław Niemczyk. Licensed under the terms of the MIT license.

About

Implementations of the classic and covering locality-sensitive hashing schemes for Hamming space

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published