-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fuzzy: Match substrings fuzzily #2043
Comments
What do you mean by "full text search"? It sounds like you might be looking for the fuzzy plugin. |
I mean something as what ElasticSearch does, or SQL For my particular use case, it would allow me to search for "Michael Jackson - Black or white (original version) 1999" directly in beet and still get a result, whereas the search now does not return any result because it cannot match everything. It would also be resilient to typos I think. |
That certainly sounds like what the fuzzy plugin:
You can adjust the threshold (i.e. sensitivity) in your configuration as well. |
Indeed, I missed it, my bad. Still, it seems it is only looking at track title and not performing a query on every available field as regular |
I believe the plugin should query the standard set of fields, unless you tell it not to (as with ordinary queries). |
Hmm, I have in my config:
(I set a higher threshold to get less false positives) Then, Note that the standard search Lowering the threshold increases a lot the number of false positives, but does not seem to give any Michael Jackson song either. |
I think the similarity is proportional to the entire field, not just a substring. Have you tried something like |
Indeed, it is the case. And using What I am looking for is the same thing as SQL
I am not sure whether it exists or not already in beets, or if it could be of any use to anyone else than me? |
Sure, it might make sense to extend the fuzzy plugin for this purpose. Substring fuzzy matching is probably more intuitive anyway. Would that make sense for you? |
Yes, I think that would do it. |
Cool! I've updated the title to reflect that idea. |
Sorry for the necro, but I just wanted to say this would be immensely helpful. I don't think it would be too difficult to implement, either. Off the top of my head, I think the way to do this would be to change how the ratio threshold is calculated. The current implementation uses Lines 26 to 34 in dae5257
Looking at What I propose is that the threshold calculation be changed to: threshold = config["fuzzy"]["threshold"].as_number()
if len(pattern) < len(val):
max_possible_ratio = len(pattern) / (len(pattern) + len(val))
threshold *= max_possible_ratio This should not impact performance at all and should solve this issue. Happy to put up a PR! |
Code snippet I provided was slightly wrong, should be I also noticed that using |
Hi,
For a project I have, I need to match Youtube video titles against my own music collection managed by beets. Problem is Youtube videos have very different titles, and often have extra noise like "(official video !!!)" which prevents from using
beet ls
directly.I came up with some heuristics to sanitize them, but still, this is not really reliable.
I am not sure if anyone already did it (but I could not find it) or if it might be interesting either for beet or anyone here to have a full text search in beet?
In case it might be interesting, either to be merged or as a plugin, I am open to any feedback or advice. I was planning on using
whoosh
for my particular prototyping case.Thanks
The text was updated successfully, but these errors were encountered: