Skip to content

LearnedMetricIndex/LearnedMetricIndex

Repository files navigation

Introduction

Learned Metric Index (LMI) is an index for approximate nearest neighbor search on complex data using machine learning and probability-based navigation.

Getting started

See examples of how to index and search in a dataset in: 01_Introduction.ipynb notebook.

Installation

See also .github/workflows/ci.yml

Using conda

conda create -n env python=3.8
conda activate env
conda install matplotlib pandas scikit-learn jupyterlab
pip install h5py flake8 setuptools tqdm faiss-cpu
pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install --editable .

Running

jupyter-lab
# and open 01_Introduction.ipynb

# or
python3 search/search.py

Evaluation

python3 eval/eval.py
python3 eval/plot.py res.csv

Performance

LMI comprised of 1 ML model

  • Recall: 91.421%
  • Search runtime (for 10k queries): ~220s
  • Build time: 20828s
  • Dataset: LAION1B, 10M subset
  • Hardware used:
    • CPU Intel Xeon Gold 6130
    • 42gb RAM
    • 1 CPU core
  • Hyperparameters:
    • 120 leaf nodes
    • 200 epochs
    • 1 hidden layer with 512 neurons
    • 0.01 learning rate
    • 4 leaf nodes stop condition

Hardware requirements

10M:

  • 42gb RAM
  • 1 CPU core
  • ~6h of runtime (waries depending on the hardware)

LMI in action

Publications

"LMI Proposition" (2021):

M. Antol, J. Ol'ha, T. Slanináková, V. Dohnal: Learned Metric Index—Proposition of learned indexing for unstructured data. Information Systems, 2021 - Elsevier (2021)

"Data-driven LMI" (2021):

T. Slanináková, M. Antol, J. Ol'ha, V. Kaňa, V. Dohnal: Learned Metric Index—Proposition of learned indexing for unstructured data. SISAP 2021 - Similarity Search and Applications pp 81-94 (2021)

"LMI in Proteins" (2022):

J. Ol'ha, T. Slanináková, M. Gendiar, M. Antol, V. Dohnal: Learned Indexing in Proteins: Extended Work on Substituting Complex Distance Calculations with Embedding and Clustering Techniques, and Learned Indexing in Proteins: Substituting Complex Distance Calculations with Embedding and Clustering Techniques SISAP 2022 - Similarity Search and Applications pp 274-282 (2022)

"Reproducible LMI" (2023):

T. Slanináková, M. Antol, J. Ol'ha, V. Kaňa, V. Dohnal, S. Ladra, M. A. Martinez-Prieto: Reproducible experiments with Learned Metric Index Framework. Information Systems, Volume 118, September 2023, 102255 (2023)

Team

About

Learned Metric Index (LMI) is a machine learning based data structure for fast look-up of approximate nearest neighbor queries for complex data

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published