Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow flagging bad results to capture data for refinement #97

Open
xdg opened this issue Aug 3, 2023 · 1 comment
Open

Allow flagging bad results to capture data for refinement #97

xdg opened this issue Aug 3, 2023 · 1 comment

Comments

@xdg
Copy link

xdg commented Aug 3, 2023

I shared the site with some friends and one reported that "space opera" came back with Gaston Leroux The Phantom of the Opera, which is a pretty big miss. Perhaps you could add a thumbs down or other method to capture data about bad recommendations to improve the training.

@veekaybee
Copy link
Owner

Thanks @xdg , this is good feedback and a great idea! It goes hand in hand with something I've been thinking about to improve model performance.

In general, semantic search and clustering are hard problems to solve, especially combined with query understanding.. The problem is that, without guiding the search results, they will only match on similarity, which is what your friend saw here, and it's very likely in the first pass to get weird stuff. as is shown here:

image

There's a number of different approaches that we can take to tune the model, each might be more or less successful, but we'll need to do all of these in combination to get better results:

  • The way the model currently works, there is no training that happens - I use a pretrained sentence-transformers model. One approach might be to fine-tune the model with logged results for better query understanding
  • Another might be to tune the hyperparameters of the model itself, i.e. make the cosine similarity threshold higher or change the edges
  • A third might be to simply hand-filter some bad results

In all of these cases, logging like you mentioned is important to see how many misses are actually happening in production. Stay tuned for the development of thumbs up/thumbs down-style feedback. I'll need to:

  • Create the UX elements
  • Wire up an inspected click to logging
  • Start systemically collecting logs and analyzing them
  • Make sure I have enough space to collect feedback logs
  • Implement a system to look at these logs quickly (Kibana would be nice but it's a lot of overhead for this project atm)

Keeping this open for now and letting you know I'm thinking about it, just might take a while to implement :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants