New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Convert edit distance to ratio/similarity #28
Comments
It is worth noting, that
but returns the normalized InDel distance (similar to Levenshtein distance but does not allow Substitutions). Which is normalized as:
I provide a similar metric in my library RapidFuzz: https://maxbachmann.github.io/RapidFuzz/string_metric.html#normalized-levenshtein |
[...]
I think this is also not correct though. It should be the length of the alignment path as denominator (which is slightly longer than the longest sequence). See this comment |
Looks promising, will have a look.
Not sure for the right reference (most papers do not even treat this detail), but I found this: https://www.yorku.ca/mack/nordichi2002-shortpaper.html
Yes, exactly. Same for CER. Unfortunately, even fairly standard tools (used a lot for published results) make that mistake (see here). Others get it right, though.
Yes, using the maximum distance is already much better than using the reference string length. My point was that even max-dist is biased: the actual alignment path can be longer. (For example, aligning |
I would like a way to get the similarity of the 2 strings instead of just the distance.
For example with python-Levenshtein I can do:
The text was updated successfully, but these errors were encountered: