Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any additional suggestion about performance ? #6

Open
DrEight opened this issue Jan 5, 2024 · 6 comments
Open

Any additional suggestion about performance ? #6

DrEight opened this issue Jan 5, 2024 · 6 comments

Comments

@DrEight
Copy link

DrEight commented Jan 5, 2024

Do you have any additional suggestions on how to optimize performance apart from what you wrote in the article?

@ADefWebserver
Copy link
Contributor

What performance issues are you having?

@DrEight
Copy link
Author

DrEight commented Jan 5, 2024

I have a Azure DTU based database with 200DTUs. This is my typical scenario. The query takes: CPU time = 2235 ms, elapsed time = 1133 ms.
I tried:

  1. convert the float to real. The size of the storage is less, but the performance is unchanged.
  2. I 'quantized' the float to an int16. Same results of the url is missing /embeddings #1
  3. I tried to calculate the vectors out of the json using only the table 'wikipedia_articles_embeddings', but the performance are horrible.
  4. I tried to calculate the vector stored as a concatenated string, splitting it with STRING_SPLIT. Same as Update README.md to add Community Samples #3
  5. I tried to increase the compression of the column store with: DATA_COMPRESSION = COLUMNSTORE_ARCHIVE. No changes in the performance

@ADefWebserver
Copy link
Contributor

So 1133 ms is unacceptable performance? This is probably what I am getting, but in my applications, it has not been an issue. I always have 'other processes' like my calls the LLM going on, that have their own delays, before returning a result to the end user.

@DrEight
Copy link
Author

DrEight commented Jan 5, 2024

I would like to have at least the double of the performance. My concern is related to the storage and the number of the users connected to the db. 25000 articles are a good amount of data for my use case, but I can't exclude that they become more, and I'm assuming that the performance is inversely proportional to the number of articles.

@hitec-lbmc
Copy link

I am considering expanding the use of SQL Server for my production application, which is an RAG platform with about 80,000 pages of data. Our application's data needs are growing, and we're exploring options to increase our storage capacity significantly.

Given that SQL Server is our favorite DB, I'm interested in exploring it as the layer that stores the vector embedding from our data.

Did you have success in optimizing performance?
Any insights or lessons learned from your experience would be greatly appreciated, especially if you've navigated similar challenges.

If you decided not to use SQL Server for storing vectors, I would love to learn why.

Thank you.

@Luigi-Ottoboni
Copy link

Yes, I'm still using the SQL and I did a lot of tests and improvements.
First as I mentioned above converting from float to real, gave me a 3.2X performance.
Then I notice that the kernel memory is adding to many layers and abstractions, so I extracted the code for my use case. I can't tell you exactly, but this was the biggest improvement in performance and code maintenance.
Another rule: do not mix 'row query' with 'column query': when you query a table with the column store, it must be a pure table scan.
Then optimise the embedding. I strongly suggest to take a look to the new text-embedding-3-large/text-embedding-3-small. They supports 'shortening' meaning that you can reduce the size of the vector finding the best balance performance/accuracy.
Feel free to contact me also privately if you need more info

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants