Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Redis Memory Connector Performance Issues when calling Memory.Search #6088

Open
engt21 opened this issue May 1, 2024 · 1 comment
Labels
enhancement .NET Issue or Pull requests regarding .NET code python Pull requests for the Python Semantic Kernel

Comments

@engt21
Copy link

engt21 commented May 1, 2024

There are several performance issues with the Python Redis Memory Connector. These issues were discovered by developing an application in C# that worked properly, then duplicating the application in Python and noticing significant differences.

As they are related and occurred in the same application we tested within our enterprise and are interconnected, and the fixes are partially connected, I am listing the issues here.

Within our enterprise, I have also developed a fix, and can raise a PR with the changes that worked for us. I have only included the problems below and not our functioning solution.

Describe the bug vs. Expected behavior

Issue No. 1: Incorrect relevancy score calculation
The correct behavior in C# is displayed in the below screenshots.

image
image

It should A) take the similarity score, and then B) as shown in the GetSimilarity helper, return 1 - vectorScore, which is the cosine distance, not the similarity, due to how redis reports the score/creates the index.

However, if you look at what Python does, it only gets the vector_score as is. This means that all the results will be returning the inverse of what it should be.
image

NB: When we ran it with our fix, we found the exact same performance as C#, which was the expected behavior.

Issue No. 2: Incorrect Results Order

After fixing Issue 1 above, we found that we were still getting poor performance. This is because the results from the redis query are default returning in descending order, while C# was not. Thus, we found that the lowest results were being returned at the top and only those results were coming back.
image

This is a quick fix with the return settings.

Issue No. 3: Error thrown while parsing documents
When retrieving attributes of the document - for example, in getting the vector_score, but also in the utils deserialize_document_to_record file, we get "Document object is not subscriptable" where the call to redis_key = doc["id"] happens.

The solution we developed worked while also allowing the previous functionality to stand.

As an aside, while we did not check this theory out, but we loaded our data initially to redis via C# SK, and then retrieved with Python SK. Not sure if this would cause the issue; however, our logic was that clients should be language agnostic and should not have issues retrieving documents regardless of the method of storing them in redis.

Similar issues would possibly come up with the call to split_redis_key, as it assumes there is only one instance of ":" in the key with its implementation. While this is fine if the user stored data with the SK redis connector, for some enterprises or applications that have separate data pipelines to store then use SK to retrieve, this would lead to parsing errors.

To Reproduce/Screenshots

Steps to reproduce the behavior:
You will need an existing redis instance. You should then create an instance of the Redis Memory Connector.
We used embeddings ada2.
You should then store some data in redis - you can use SK in Python or C#.
You should then do a memory.search on a test query to see similarity to your data.

NB: As this application is developed within my enterprise, and the code samples we ran, along with displayed output, has firm specific data, I cannot paste any screenshots or provide details on showing examples of how to reproduce.

While this was not the example our enterprise used, one would expect similar issues.

Storing book descriptions and their titles in redis, then providing sample user query about type of book they're interested and returning top books based off similarity of topics.

Platform

@markwallace-microsoft markwallace-microsoft added .NET Issue or Pull requests regarding .NET code python Pull requests for the Python Semantic Kernel triage labels May 1, 2024
@github-actions github-actions bot changed the title Python Redis Memory Connector Get Nearest Matches Performance Issues .Net: Python Redis Memory Connector Get Nearest Matches Performance Issues May 1, 2024
@engt21 engt21 changed the title .Net: Python Redis Memory Connector Get Nearest Matches Performance Issues Python: Redis Memory Connector Get Nearest Matches Performance Issues May 1, 2024
@engt21 engt21 changed the title Python: Redis Memory Connector Get Nearest Matches Performance Issues Python: Redis Memory Connector Performance Issues when calling Memory.Search May 1, 2024
@engt21
Copy link
Author

engt21 commented May 1, 2024

Please let me know if/when/what next steps are before I can raise a PR. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement .NET Issue or Pull requests regarding .NET code python Pull requests for the Python Semantic Kernel
Projects
None yet
Development

No branches or pull requests

3 participants