Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature / Idea: pip install for reuse/standardisation #411

Open
Jeadie opened this issue May 9, 2023 · 2 comments
Open

Feature / Idea: pip install for reuse/standardisation #411

Jeadie opened this issue May 9, 2023 · 2 comments

Comments

@Jeadie
Copy link
Contributor

Jeadie commented May 9, 2023

Overview

erikbern/ann-benchmarks is not only an invaluable open-source comparison of popular ANN method (and, I guess, now ANN databases), but it could also provides a solid framework for performance testing/reporting.

Motivation

I am looking to build performance testing/benchmarks into pgvector (see Issue #16), which currently reports results to ann-benchmarks. It seems superfluous to reinvent the wheel (i.e. dataset process, test execution, metric computation, etc), when pgvector will always want to report to ann-benchmarks too.

Scope

The most important functionality for ann-benchmarks to be useful as an import are in ann_benchmarks/*.py and the base algorithm implementation class.

pip install ann-benchmarks

# For algorithms, all or specific
pip install ann-benchmarks[algorithms]
pip install ann-benchmarks[annoy]

# For datasets (may be preferable to just download and cache)
pip install ann-benchmarks[DEEP1B]

This may work better/well with the code restructuring mentioned in #383 .

@erikbern
Copy link
Owner

erikbern commented May 9, 2023

Most/all of these algorithms require custom binary stuff though, so that's why we use Docker. Using pip for particular algorithms would just get us 30% of the way there!

That being said, it probably makes sense to turn ann-benchmarks into a bit more of a library. There's a lot of refactoring worth doing in general. I'm hoping to find a bit more time but random stuff keeps coming in between (starting a startup etc). But definitely want to clean up a bunch of stuff and I'll see if we can try to turn it into a library slowly.

@Jeadie
Copy link
Contributor Author

Jeadie commented May 9, 2023

Good point, I didn't know of a use case for pip install ann-benchmarks[algorithms] off the top of my head. Maybe even pip install ann-benchmarks[annoy] is excessive. I think the library user could be responsible for both 1) starting the local binary/docker container, then 2) starting their ann-benchmarks pointing (either via docker id of URI) to their algorithm

I'm happy to help out with some of the refactoring. If i understand correctly, #383 is the main issue discussing future refactors?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants