Publish 1M embeddings benchmark #464

KShivendu · 2023-09-15T08:29:46Z

Hello again @erikbern, I hope you're doing well.

I was curious to know the progress on releasing the benchmarks on the 1M OpenAI embeddings dataset that I created for #434

We can use this issue to track the same. Let me know if I can help in any way :)

KShivendu · 2023-09-18T13:54:11Z

#460 will also fix this. So closing this issue. Thanks :)

KShivendu · 2023-10-25T04:49:00Z

@erikbern @maumueller Is there anything I can contribute to publish the 1M benchmark sooner? It'd really help my friends to see a larger dataset benchmark. I'm happy to help with the running and handling any errors for the benchmarks as well.

erikbern · 2023-10-25T12:24:23Z

Hi – planning to rerun all benchmarks at some point soon.

That being said, is the OpenAI dataset significantly different than previous datasets? I'm somewhat hesitant to use too many similar datasets – we already have a few ones that are similar size.

KShivendu · 2023-10-30T04:37:24Z

It's different in some specific ways:

1536 embeddings — most other datasets are at 384-512-768 embedding.
1M records is among the largest
OpenAI embeddings are one of the most popular ones at the moment. Benchmarks for that will make ann-benchmarks.com even more useful for the community.

KShivendu · 2024-01-10T14:58:24Z

Hi @erikbern I hope you're doing well

I noticed that ann-benchmarks.com was last updated in Dec 2021 (2+ years). A lot has changed since then. I'm pretty sure there's a lot of value for the community if we update the website. I'm happy to spend some time running the benchmarks for you. Let me know if I can help :)

KShivendu closed this as completed Sep 18, 2023

KShivendu reopened this Oct 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Publish 1M embeddings benchmark #464

Publish 1M embeddings benchmark #464

KShivendu commented Sep 15, 2023 •

edited

KShivendu commented Sep 18, 2023 •

edited

KShivendu commented Oct 25, 2023 •

edited

erikbern commented Oct 25, 2023

KShivendu commented Oct 30, 2023

KShivendu commented Jan 10, 2024

Publish 1M embeddings benchmark #464

Publish 1M embeddings benchmark #464

Comments

KShivendu commented Sep 15, 2023 • edited

KShivendu commented Sep 18, 2023 • edited

KShivendu commented Oct 25, 2023 • edited

erikbern commented Oct 25, 2023

KShivendu commented Oct 30, 2023

KShivendu commented Jan 10, 2024

KShivendu commented Sep 15, 2023 •

edited

KShivendu commented Sep 18, 2023 •

edited

KShivendu commented Oct 25, 2023 •

edited