Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve search filtering and sorting #1050

Open
dvic opened this issue Aug 5, 2021 · 6 comments
Open

Improve search filtering and sorting #1050

dvic opened this issue Aug 5, 2021 · 6 comments

Comments

@dvic
Copy link

dvic commented Aug 5, 2021

Hi!

Is there any interest in a PR that improves the filtering and sorting of the search?

For example, when I currently search for OpenTelemetry Ecto, I get this:

image

What I expect here is to have

  • opentelemetry_ecto be the first hit
  • the packages that start with opentelemetry to be on top

My suggestion is to make the following changes:

  • change the filtering to try to do an exact match (or "starts with") with the words in search phrase lowercased and spaces converted to underscores (this might require an index on a postgres expression)
  • change sorting of the filtered results to favor packages that start with the first word of the search phrase
@ericmj
Copy link
Member

ericmj commented Aug 5, 2021

We know the search needs improvement so it's definitely an area we want to improve on.

@inoas has already started to experiment in ways to improve it and has made some progress I believe. Can you share where you are with this @inoas?

@inoas
Copy link

inoas commented Aug 5, 2021

I did indeed work on this, though I got a bit lost because I did not know what kind of code was required to stay in and generally how execution pathes are designed.

  1. I have created raw SQL around ts_query including some ranking around different factors.
  2. I have written some form of an input sanitizer and some nimble parsec based parser to allow boolean text search. It still misses prefixes/labels though (like say googles site:).

Both work stand alone - Neither is ready for a PR though.
Edit: Aside the integration work in existing hexpm what's required is mapping the boolean search tree that nimble parsec based parser returns to ecto queries.

@dvic are you on Elixir slack or IRC by any chance. I am often on Slack and would get on IRC if you prefer that. Maybe we can team up.

@dvic
Copy link
Author

dvic commented Aug 5, 2021

I did indeed work on this, though I got a bit lost because I did not know what kind of code was required to stay in and generally how execution pathes are designed.

  1. I have created raw SQL around ts_query including some ranking around different factors.
  2. I have written some form of an input sanitizer and some nimble parsec based parser to allow boolean text search. It still misses prefixes/labels though (like say googles site:).

Both work stand alone - Neither is ready for a PR though.
Edit: Aside the integration work in existing hexpm what's required is mapping the boolean search tree that nimble parsec based parser returns to ecto queries.

Nice! I was thinking of starting much simpler but if you have this already started I'm sure we can build on top of this :)

@dvic are you on Elixir slack or IRC by any chance. I am often on Slack and would get on IRC if you prefer that. Maybe we can team up.

Yes I'm @dvic on the Elixir slack, just joined #hex.

@adamwight
Copy link
Contributor

Minor tweak which normalizes exact matching to convert spaces to underscores: #1089 (still needs some template adjustment).

@Ch4s3
Copy link
Contributor

Ch4s3 commented Jul 4, 2022

@inoas are you still working on this? If so I'd be interested in pitching in.

@inoas
Copy link

inoas commented Jul 5, 2022

@Ch4s3 feel free to go ahead. If you can also get in touch on Slack discord and I can share my SQL queries that were quite promising but I got lost integrating them into the existing app.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants