Learned Metric Index (LMI) is an index for approximate nearest neighbor search on complex data using machine learning and probability-based navigation.
See examples of how to index and search in a dataset in: 01_Introduction.ipynb notebook.
See also .github/workflows/ci.yml
conda create -n env python=3.8
conda activate env
conda install matplotlib pandas scikit-learn jupyterlab
pip install h5py flake8 setuptools tqdm faiss-cpu
pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install --editable .
jupyter-lab
# and open 01_Introduction.ipynb
# or
python3 search/search.py
python3 eval/eval.py
python3 eval/plot.py res.csv
LMI comprised of 1 ML model
- Recall: 91.421%
- Search runtime (for 10k queries): ~220s
- Build time: 20828s
- Dataset: LAION1B, 10M subset
- Hardware used:
- CPU Intel Xeon Gold 6130
- 42gb RAM
- 1 CPU core
- Hyperparameters:
- 120 leaf nodes
- 200 epochs
- 1 hidden layer with 512 neurons
- 0.01 learning rate
- 4 leaf nodes stop condition
10M:
- 42gb RAM
- 1 CPU core
- ~6h of runtime (waries depending on the hardware)
"LMI Proposition" (2021):
M. Antol, J. Ol'ha, T. Slanináková, V. Dohnal: Learned Metric Index—Proposition of learned indexing for unstructured data. Information Systems, 2021 - Elsevier (2021)
"Data-driven LMI" (2021):
T. Slanináková, M. Antol, J. Ol'ha, V. Kaňa, V. Dohnal: Learned Metric Index—Proposition of learned indexing for unstructured data. SISAP 2021 - Similarity Search and Applications pp 81-94 (2021)
"LMI in Proteins" (2022):
J. Ol'ha, T. Slanináková, M. Gendiar, M. Antol, V. Dohnal: Learned Indexing in Proteins: Extended Work on Substituting Complex Distance Calculations with Embedding and Clustering Techniques, and Learned Indexing in Proteins: Substituting Complex Distance Calculations with Embedding and Clustering Techniques SISAP 2022 - Similarity Search and Applications pp 274-282 (2022)
"Reproducible LMI" (2023):
T. Slanináková, M. Antol, J. Ol'ha, V. Kaňa, V. Dohnal, S. Ladra, M. A. Martinez-Prieto: Reproducible experiments with Learned Metric Index Framework. Information Systems, Volume 118, September 2023, 102255 (2023)
- Terézia Slanináková, Masaryk University
- David Procházka, Masaryk University
- Jaroslav Oľha, Masaryk University
- Matej Antol, Masaryk University
- Vlastislav Dohnal, Masaryk University