score results for relevance #130

thatbudakguy · 2020-11-11T22:36:13Z

running against a large corpus, especially with some settings, can result in a huge volume of results. many of them are "low-quality" in that the matching portion consists of superficially similar elements that don't carry much semantic weight.

adjusting the match length can help, but there might be other heuristics we can use to improve relevance. one possibility is TF-IDF.

thatbudakguy · 2021-02-05T21:40:49Z

one possible quick n' dirty way to do this is to implement something like passim's --max-series, which for us would translate to dropping seed groups from the index if there are too many entries in the group (indicating a super common seed).

if we do TF-IDF, we can also implement that at the seed level to prune the graph early.

thatbudakguy added the enhancement New feature or request label Nov 11, 2020

thatbudakguy added this to the v3.0 milestone Feb 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

score results for relevance #130

score results for relevance #130

thatbudakguy commented Nov 11, 2020

thatbudakguy commented Feb 5, 2021

score results for relevance #130

score results for relevance #130

Comments

thatbudakguy commented Nov 11, 2020

thatbudakguy commented Feb 5, 2021