Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Net: Text Search Service ADR #6012

Open
wants to merge 18 commits into
base: feature-text-search-service
Choose a base branch
from

Conversation

markwallace-microsoft
Copy link
Member

@markwallace-microsoft markwallace-microsoft commented Apr 26, 2024

Motivation and Context

This is a POC for the new text search service abstraction. Sharing very early for team collaboration.

Description

Contribution Checklist

@markwallace-microsoft markwallace-microsoft added .NET Issue or Pull requests regarding .NET code python Pull requests for the Python Semantic Kernel kernel Issues or pull requests impacting the core kernel kernel.core documentation memory labels Apr 26, 2024
@github-actions github-actions bot changed the title Text Search Service ADR Python: Text Search Service ADR Apr 26, 2024
@github-actions github-actions bot changed the title Python: Text Search Service ADR .Net: Text Search Service ADR Apr 26, 2024
@markwallace-microsoft markwallace-microsoft changed the title .Net: Text Search Service ADR .Net: Text Search Service ADR (Work in Progress) Apr 26, 2024
@markwallace-microsoft markwallace-microsoft removed the python Pull requests for the Python Semantic Kernel label May 2, 2024
@markwallace-microsoft markwallace-microsoft force-pushed the users/markwallace/text-search-service-adr branch from 13b1f6f to 2bae9e8 Compare May 2, 2024 11:32
@markwallace-microsoft markwallace-microsoft changed the title .Net: Text Search Service ADR (Work in Progress) .Net: Text Search Service ADR May 9, 2024
@markwallace-microsoft markwallace-microsoft marked this pull request as ready for review May 9, 2024 09:40
@markwallace-microsoft markwallace-microsoft requested a review from a team as a code owner May 9, 2024 09:40
## Context and Problem Statement

Semantic Kernel has support for searching using popular Vector databases e.g. Azure AI Search, Chroma, Milvus and also Web search engines e.g. Bing, Google.
There are two sets of abstractions and plugins depending on whether the developer wants to perform search against a Vector database of a Web search engine.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be "a Vector database or a Web search engine."?

Comment on lines +26 to +40
var stringPlugin = new TextSearchPlugin<string>(searchService);
kernel.ImportPluginFromObject(stringPlugin, "StringSearch");
var pagePlugin = new TextSearchPlugin<BingWebPage>(searchService);
kernel.ImportPluginFromObject(pagePlugin, "PageSearch");

// Invoke the plugin to perform a text search and return string values
var question = "What is the Semantic Kernel?";
var function = kernel.Plugins["StringSearch"]["Search"];
var result = await kernel.InvokeAsync(function, new() { ["query"] = question });

Console.WriteLine(result);

// Invoke the plugin to perform a text search and return BingWebPage values
function = kernel.Plugins["PageSearch"]["Search"];
result = await kernel.InvokeAsync(function, new() { ["query"] = question, ["count"] = 2 });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SearchAsync method is generic, and each implementation has these checks: typeof(T) == typeof(string) or typeof(T) == typeof(TextSearchResult). TextSearchPlugin class is also generic, and for each specific type you need to create separate instance (e.g. TextSearchPlugin<string> and TextSearchPlugin<BingWebPage>).

My point during initial discussion was - with generics and type checking it's already a sacrifice, but at the end of the day we still can get only string from kernel.InvokeAsync, and if it's object, it will be serialized. I think it will be really useful to get the actual search result type like result.GetValue<BingWebPage> after InvokeAsync method. Otherwise, we can probably avoid using generics, work just with object, and the result of this sample will still be the same.

Just some ideas what we can do in theory:

  1. Make ITextSearchService<T> generic.
  2. For specific service (e.g. BingTextSearchService), implement interface with all possible types that can be produced by this service: BingTextSearchService: ITextSearchService<string>, ITextSearchService<BingWebPage> - this should allow to avoid typeof(T) == typeof(string) check in SearchAsync implementation.
  3. Based on that, we don't need TextSearchPlugin class, because we can mark all SearchAsync methods in BingTextSearchService as KernelFunction and name them as you did in line 27 and 29 - StringSearch and PageSearch. We will end up with BingPlugin-StringSearch and BingPlugin-PageSearch in kernel. If we share it with LLM, it will be able to decide which function to use based on user prompt. After specific function is invoked, it will return specific type, which will be possible to get with result.GetValue<T>.

Another way for point 3, we can keep TextSearchPlugin as non-generic, which will accept ITextSearchService<string> and/or ITextSearchService<WebPage> in constructor. Then, it will be possible to inject any service that implements ITextSearchService<string> and/or ITextSearchService<WebPage> (e.g. Bing or Google). In this case, you won't need to initialize TextSearchPlugin multiple times, DI can resolve all dependencies for you, there will be no need in type checking in search methods and complex type should be available as result of InvokeAsync method.

Let me know what you think about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation kernel.core kernel Issues or pull requests impacting the core kernel memory .NET Issue or Pull requests regarding .NET code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants