Allow flagging bad results to capture data for refinement #97

xdg · 2023-08-03T10:43:02Z

I shared the site with some friends and one reported that "space opera" came back with Gaston Leroux The Phantom of the Opera, which is a pretty big miss. Perhaps you could add a thumbs down or other method to capture data about bad recommendations to improve the training.

veekaybee · 2023-08-06T11:04:52Z

Thanks @xdg , this is good feedback and a great idea! It goes hand in hand with something I've been thinking about to improve model performance.

In general, semantic search and clustering are hard problems to solve, especially combined with query understanding.. The problem is that, without guiding the search results, they will only match on similarity, which is what your friend saw here, and it's very likely in the first pass to get weird stuff. as is shown here:

There's a number of different approaches that we can take to tune the model, each might be more or less successful, but we'll need to do all of these in combination to get better results:

The way the model currently works, there is no training that happens - I use a pretrained sentence-transformers model. One approach might be to fine-tune the model with logged results for better query understanding
Another might be to tune the hyperparameters of the model itself, i.e. make the cosine similarity threshold higher or change the edges
A third might be to simply hand-filter some bad results

In all of these cases, logging like you mentioned is important to see how many misses are actually happening in production. Stay tuned for the development of thumbs up/thumbs down-style feedback. I'll need to:

Create the UX elements
Wire up an inspected click to logging
Start systemically collecting logs and analyzing them
Make sure I have enough space to collect feedback logs
Implement a system to look at these logs quickly (Kibana would be nice but it's a lot of overhead for this project atm)

Keeping this open for now and letting you know I'm thinking about it, just might take a while to implement :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow flagging bad results to capture data for refinement #97

Allow flagging bad results to capture data for refinement #97

xdg commented Aug 3, 2023

veekaybee commented Aug 6, 2023

Allow flagging bad results to capture data for refinement #97

Allow flagging bad results to capture data for refinement #97

Comments

xdg commented Aug 3, 2023

veekaybee commented Aug 6, 2023