DEG - Dynamic Exploration Graphs for efficient approximate nearest neighbor search

Header-only C++ DEG implementation with SIFT1M experiment.

Introduction

This repository contains the necessary code to build a Dynamic Exploration Graph (DEG) and use it for efficient approximate nearest neighbor search (ANNS) tasks. Compared to other algorithms in the litature, DEG changes over time by swapping edges to yield better search results in the future. In case of a static dataset like the SIFT1M the graph gets slowly extended and improved step by step until all the data is stored inside the graph. Afterwards it can be tested with the query and ground truth information of the dataset. The results of those tests show DEG is on-par with the other state-of-the-art algorithms. But its real strength comes to shine in dynamic environments. In use-cases where the data changes over time (like products of a shopping website) DEG can always provide the best search results without recalculating the entire graph.

Highlights

Lightweight, header-only, right now only minor dependencies (all header-only) and compliend to C++ 20.
Has full support for incremental graph construction and improvement.
Can work with custom user defined distance metricies (C++).
Has a comparison branch with the HNSW library

SIFT1M benchmark reproduction

The code for the benchmark is in the deglib_build_bench.cpp and hnswlib_bench.cpp file. The dataset must be downloaded before hand. Both benchmark files contain a DATA_PATH variable which can be changed by renaming cmake-variants.sample.yaml file to cmake-variants.yaml and add the path to the dataset in the last line.

The DEG builder benchmark has a lot of parameters. Generally we recommend changing only the k and eps parameter and keep it the same for all variants (extend, improve, improve_extended). To quickly reproduce the results in the line chart below the following settings are needed: edges_per_node = 24; extend_k = 20; extend_eps = 0.2; improve_k = 20; improve_eps = 0.02; improve_extended_k = 12; improve_extended_eps = 0.02; max_path_length = 10; swap_tries = 3; additional_swap_tries = 3;

For the HNSW graph it will be the following settings: efConstruction = 500; M = 24;

Roadmap

There is no specific order. All the functionality listed here will be added when the associated paper comes out.

Support for element deletions will be added in the future
Benchmark for dynamic datasets
Python bindings

References

Coming soon

Name		Name	Last commit message	Last commit date
Latest commit History 324 Commits
benchmark		benchmark
cmake_modules		cmake_modules
deglib		deglib
external		external
hnswlib		hnswlib
results		results
.clang-format		.clang-format
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
cmake-variants.sample.yaml		cmake-variants.sample.yaml

License

Neiko2002/deglib

Folders and files

Latest commit

History

Repository files navigation

DEG - Dynamic Exploration Graphs for efficient approximate nearest neighbor search

Introduction

Highlights

SIFT1M benchmark reproduction

Roadmap

References

About

Resources

License

Stars

Watchers

Forks

Languages