Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish 1M embeddings benchmark #464

Open
KShivendu opened this issue Sep 15, 2023 · 5 comments
Open

Publish 1M embeddings benchmark #464

KShivendu opened this issue Sep 15, 2023 · 5 comments

Comments

@KShivendu
Copy link
Contributor

KShivendu commented Sep 15, 2023

Hello again @erikbern, I hope you're doing well.

I was curious to know the progress on releasing the benchmarks on the 1M OpenAI embeddings dataset that I created for #434

We can use this issue to track the same. Let me know if I can help in any way :)

@KShivendu
Copy link
Contributor Author

KShivendu commented Sep 18, 2023

#460 will also fix this. So closing this issue. Thanks :)

@KShivendu
Copy link
Contributor Author

KShivendu commented Oct 25, 2023

@erikbern @maumueller Is there anything I can contribute to publish the 1M benchmark sooner? It'd really help my friends to see a larger dataset benchmark. I'm happy to help with the running and handling any errors for the benchmarks as well.

@KShivendu KShivendu reopened this Oct 25, 2023
@erikbern
Copy link
Owner

Hi – planning to rerun all benchmarks at some point soon.

That being said, is the OpenAI dataset significantly different than previous datasets? I'm somewhat hesitant to use too many similar datasets – we already have a few ones that are similar size.

@KShivendu
Copy link
Contributor Author

It's different in some specific ways:

  1. 1536 embeddings — most other datasets are at 384-512-768 embedding.
  2. 1M records is among the largest
  3. OpenAI embeddings are one of the most popular ones at the moment. Benchmarks for that will make ann-benchmarks.com even more useful for the community.

@KShivendu
Copy link
Contributor Author

Hi @erikbern I hope you're doing well

I noticed that ann-benchmarks.com was last updated in Dec 2021 (2+ years). A lot has changed since then. I'm pretty sure there's a lot of value for the community if we update the website. I'm happy to spend some time running the benchmarks for you. Let me know if I can help :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants