forked from tdebatty/java-string-similarity
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update #2
Open
alicanozer
wants to merge
118
commits into
alicanozer:master
Choose a base branch
from
tdebatty:master
base: master
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
update #2
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
MetricLCS and NormalizedLevenshtein both divide by the max string length to produce distances. If two empty strings are used then a division by zero occurs and NaN is returned.
Prevent divide by zero errors.
DL Optimal String Alignment implementation
Return ngram distance instead of similarity
Performance: avoid spurious containsKey calls
Add a limit parameter to Levenshtein and WeightedLevenshtein's distance methods. This causes the calculation to exit early if the limit is reached. This means that if the caller only cares about strings with a small distance, they can terminate early if the strings are found to be very different.
Add a limit parameter to the {Weighted,}Levenshtein distance.
Fix issue #53 Many thanks to @paulirwin for the thorough issue analysis!
Added Ratcliff-Obershelp implementation, ported from .Net code by Ligi (https://github.com/dxpux)
Clean up the code and have it pass the check style.
Test unit for Ratcliff-Obershelp algorithm
Added test data from various sources.
Fixed diamond operator to comply with Java 1.6
Cosmetic edit
Implementation of Ratcliff-Obershelp algorithm
Add regular (non-null/empty) Cosine Test Cases.
The previous readme made it same like Jaro-Winkler was the ideal typo detector, when in actuality it is really only suited for typos caused by unsynchronized high-speed typing between between both hands but does not account for actual miskey errors such as hitting the wrong key altogether or advertently pressing two keys instead of one. This is because Jaro-Winkler operates only on transpositions and does not favorbly consider a string consisting strictly of additions or permutitions with letters not already part of the word's alphabet to be "similar" changes.
Update Jaro-Winkler description in README thanks @mqudsi
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.