Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thinking about restructuring the code a bit #383

Open
erikbern opened this issue Apr 14, 2023 · 5 comments
Open

Thinking about restructuring the code a bit #383

erikbern opened this issue Apr 14, 2023 · 5 comments

Comments

@erikbern
Copy link
Owner

erikbern commented Apr 14, 2023

I think it would be nice to keep each algorithm together. I'm thinking about doing something like this

ann_benchmarks/algorithms/xyz/module.py – the python code for each one
ann_benchmarks/algorithms/xyz/Dockerfile – installation
ann_benchmarks/algorithms/xyz/config.yml – what's currently in algos.yml, but broken up by algorithm

This would replace install/ and algos.yml that we have today

lmk if you have any strong objections to this? I think it would simplify the structure a bit.

@alexklibisz
Copy link
Contributor

I like it. This would also let some algos simplify their Dockerfiles. Some algos currently create config files "in-line" in the dockerfile, via cat, echo, etc., which is pretty brittle, e.g., https://github.com/erikbern/ann-benchmarks/blob/main/install/Dockerfile.opensearchknn#L30-L37 Much cleaner to just keep that config as a file in the directory and ADD it into the docker image.

@erikbern
Copy link
Owner Author

yeah good point, that would work too

@thomasahle
Copy link
Contributor

Great idea. Took me a while, the first time, to figure out all the files I needed to edit.

@Jeadie
Copy link
Contributor

Jeadie commented May 13, 2023

I did a quick speed run of this refactor. A couple of thoughts I had:

  1. Many algorithms use the same Dockerfile. ann_benchmarks/algorithms/xyz/Dockerfile does not seem intuitive in these cases. These cases do not account for the majority, and could be satisfied with the following options:
    1. No dockerfile in ann_benchmarks/algorithms/xyz, but have xyz's config.yml point to abc's dockerfile (however we want to remove this dependency between docker tag and config.yml).
    2. A simple one liner Dockerfile, FROM ann-benchmarks-abc. I haven't entirely thought through the subprocess building implications of this (i.e. if FROM ann-benchmarks-abc is not local, it will have a failed pull).
    3. Dockerfile duplication, will just required contributors to remember to update each, but I don't think this is a problem (a contributor changing a common dockerfile should exist as a standalone PR, anyhow).
  2. Having a separate config.yml per algorithm (instead of a single algos.yml) requires N yaml file reads when parsing which algorithms to use.

In sum, however, I think the refactor still makes sense.

I can have a PR up soon for perusal.

@erikbern
Copy link
Owner Author

I think it's probably fine if most Dockerfile files end up just being something like

FROM ann-benchmarks
pip install xyz

erikbern added a commit that referenced this issue Jun 9, 2023
[DRAFT] Restructuring the code #383
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants