Feature / Idea: `pip install` for reuse/standardisation #411

Jeadie · 2023-05-09T22:55:35Z

Overview

erikbern/ann-benchmarks is not only an invaluable open-source comparison of popular ANN method (and, I guess, now ANN databases), but it could also provides a solid framework for performance testing/reporting.

Motivation

I am looking to build performance testing/benchmarks into pgvector (see Issue #16), which currently reports results to ann-benchmarks. It seems superfluous to reinvent the wheel (i.e. dataset process, test execution, metric computation, etc), when pgvector will always want to report to ann-benchmarks too.

Scope

The most important functionality for ann-benchmarks to be useful as an import are in ann_benchmarks/*.py and the base algorithm implementation class.

pip install ann-benchmarks

# For algorithms, all or specific
pip install ann-benchmarks[algorithms]
pip install ann-benchmarks[annoy]

# For datasets (may be preferable to just download and cache)
pip install ann-benchmarks[DEEP1B]

This may work better/well with the code restructuring mentioned in #383 .

The text was updated successfully, but these errors were encountered:

erikbern · 2023-05-09T23:09:16Z

Most/all of these algorithms require custom binary stuff though, so that's why we use Docker. Using pip for particular algorithms would just get us 30% of the way there!

That being said, it probably makes sense to turn ann-benchmarks into a bit more of a library. There's a lot of refactoring worth doing in general. I'm hoping to find a bit more time but random stuff keeps coming in between (starting a startup etc). But definitely want to clean up a bunch of stuff and I'll see if we can try to turn it into a library slowly.

Jeadie · 2023-05-09T23:34:33Z

Good point, I didn't know of a use case for pip install ann-benchmarks[algorithms] off the top of my head. Maybe even pip install ann-benchmarks[annoy] is excessive. I think the library user could be responsible for both 1) starting the local binary/docker container, then 2) starting their ann-benchmarks pointing (either via docker id of URI) to their algorithm

I'm happy to help out with some of the refactoring. If i understand correctly, #383 is the main issue discussing future refactors?

Jeadie mentioned this issue May 9, 2023

Idea / Feature Request: Performance Benchmarking pgvector/pgvector#116

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature / Idea: `pip install` for reuse/standardisation #411

Feature / Idea: `pip install` for reuse/standardisation #411

Jeadie commented May 9, 2023

erikbern commented May 9, 2023 •

edited

Jeadie commented May 9, 2023

Feature / Idea: pip install for reuse/standardisation #411

Feature / Idea: pip install for reuse/standardisation #411

Comments

Jeadie commented May 9, 2023

Overview

Motivation

Scope

erikbern commented May 9, 2023 • edited

Jeadie commented May 9, 2023

Feature / Idea: `pip install` for reuse/standardisation #411

Feature / Idea: `pip install` for reuse/standardisation #411

erikbern commented May 9, 2023 •

edited