Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searching for exact ID is not reliable #727

Open
filiptronicek opened this issue Apr 9, 2023 · 4 comments
Open

Searching for exact ID is not reliable #727

filiptronicek opened this issue Apr 9, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@filiptronicek
Copy link
Member

When you search for the Jupyter1 extension on Open VSX [direct search link], you will be met with the first result being CodeStream.codeStream. I believe this is because we treat extension namespaces and extension names separately, and the dot in the middle is preventing better search results.

Maybe we can add the extension id (namespace.extension) to the search criteria or try resolving ID-looking search queries directly.

image

Footnotes

  1. ms-toolsai.jupyter

@filiptronicek filiptronicek added the bug Something isn't working label Apr 9, 2023
@amvanbaren
Copy link
Contributor

Maybe we can add the extension id (namespace.extension) to the search criteria

extensionId is part of the search criteria and has the highest boost.

            boolQuery.should(QueryBuilders.termQuery("extensionId.keyword", options.queryString).caseInsensitive(true)).boost(10);

            // Fuzzy matching of search query in multiple fields
            var multiMatchQuery = QueryBuilders.multiMatchQuery(options.queryString)
                    .field("name").boost(5)
                    .field("displayName").boost(5)
                    .field("tags").boost(3)
                    .field("namespace").boost(2)
                    .field("description")
                    .fuzziness(Fuzziness.AUTO)
                    .prefixLength(2);
            boolQuery.should(multiMatchQuery).boost(5);

            // Prefix matching of search query in display name and namespace
            var prefixString = options.queryString.trim().toLowerCase();
            var namePrefixQuery = QueryBuilders.prefixQuery("displayName", prefixString);
            boolQuery.should(namePrefixQuery).boost(2);
            var namespacePrefixQuery = QueryBuilders.prefixQuery("namespace", prefixString);
            boolQuery.should(namespacePrefixQuery);

Using #684 as a starting point, I think this happens because ms-toolsai.jupyter is not that frequently updated (2023-03-10T04:05:53.638673Z), making it possibly less relevant than codestream.codestream (2023-03-24T15:36:43.527142Z).

        var relevance = ratingRelevance * limit(ratingValue) + downloadsRelevance * limit(downloadsValue)
                + timestampRelevance * limit(timestampValue);

@filiptronicek Do you want me to check if this is a common issue for all exact ID searches?

@filiptronicek
Copy link
Member Author

That's really interesting. Codestream has about 5K downloads, while Jupyter has about 800K - I'm trying to say maybe this could be taken into account as well, since people are more likely to search for more popular extensions.

Also found out that ms-toolsai/jupyter (note the / instead of .) gives back the correct result. Maybe Codestream is just odd with its metadata. I think we can keep this issue open if we bump into any other examples.

@amvanbaren
Copy link
Contributor

It is taken into account, but freshness (timestamp) is prioritized over downloads. From https://github.com/EclipseFdn/open-vsx.org/blob/production/configuration/application.yml:

    relevance:
      rating: 0.2
      downloads: 1.0
      timestamp: 3.0

@Mdnou
Copy link

Mdnou commented May 26, 2023

It is taken into account, but freshness (timestamp) is prioritized over downloads. From https://github.com/EclipseFdn/open-vsx.org/blob/production/configuration/application.yml:

    relevance:
      rating: 0.2
      downloads: 1.0
      timestamp: 3.0

dgileadi/vscode-java-decompiler#17 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Todo
Development

No branches or pull requests

3 participants