Skip to content

gsamaras/Dolphinn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 

Repository files navigation

DOLPHINN

DOLPHINN is a C++ header-only library for: Dimension reductiOn and LookuPs on a Hypercube for effIcient Near Neighbor.

How to use DOLPHINN?

Just include DOLPHINN's header file. src/main.cpp contains a representative example.

Note: If you are interested in Nearest Neighbor, use DolphinnPy.

DOLPHINN is generic yet fast!

On data sets with more than 1 million points in around 128 dimensions, DOLPHINN typically requires only some milliseconds per query.


DOLPHINN provides with a simple, yet efficient method for the problem of computing an (approximate) nearest neighbor in high dimensions. The algorithm is based on our paper: Practical linear-space Approximate Near Neighbors in high dimension[Avarikioti, Prof. Emiris, Psarros (original idea) and Samaras], where we show linear space and sublinear query for a specific setting of parameters. Part of the Data Science Master Thesis of George Samaras, National Kapodistrian University of Athens, 2016.

First, N points are randomly mapped to keys in {0,1}^K, for K<=logN, by making use of the Hypeplane LSH family. Then, for a given query, candidate nearest neighbors are the ones within a small hamming radius with respect to their keys. Our approach resembles the multi-probe LSH approach but it differs on how the list of candidates is computed.

About

High Dimensional Approximate Near(est) Neighbor

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published