Release ERRANT v2.3.0 · chrisjbryant/errant

v2.3.0 (15-07-021)

Added some new rules to reduce the number of OTHER-type 1:1 edits and classify them as something else. Specifically, there are now ~40% fewer 1:1 OTHER edits and ~15% fewer n:n OTHER edits overall (tested on the FCE and W&I training sets combined). The changes are as follows:
- A possessive suffix at the start of a merge sequence is now always split:
Example people life -> people 's lives

Old life -> 's lives (R:OTHER)

New ε -> 's (M:NOUN:POSS), life -> lives (R:NOUN:NUM)
- NUM <-> DET edits are now classified as R:DET; e.g. one (cat) -> a (cat). Thanks to @katkorre!
- Changed the string similarity score in the classifier from the Levenshtein ratio to the normalised Levenshtein distance based on the length of the longest input string. This is because we felt some ratio scores were unintuitive; e.g. smt -> something has a ratio score of 0.5 despite the insertion of 6 characters (the new normalised score is 0.33).
- The non-word spelling error rules were updated slightly to take the new normalised Levenshtein score into account. Additionally, dissimilar strings are now classified based on the POS tag of the correction rather than as OTHER; e.g. amougnht -> number (R:NOUN).
- The new normalised Levenshtein score is also used to classify many of the remaining 1:1 replacement edits that were previously classified as OTHER. Many of these are real-word spelling errors (e.g. their <-> there), but there are also some morphological errors (e.g. health -> healthy) and POS-based errors (e.g. transport -> travel). Note that these rules are a little complex and depend on both the similarity score and the length of the original and corrected strings. For example, form -> from (R:SPELL) and eventually -> finally (R:ADV) both have the same similarity score of 0.5 yet are differentiated as different error types based on their string lengths.
Various minor updates:
- out_m2 in parallel_to_m2.py and m2_to_m2.py is now opened and closed properly. #20
- Fixed a bracketing error that deleted a valid edit in rare circumstances. #26 #28
- Updated the English wordlist.
- Minor changes to the readme.
- Tidied up some code comments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERRANT v2.3.0

v2.3.0 (15-07-021)

Example	people life -> people 's lives
Old	life -> 's lives (R:OTHER)
New	ε -> 's (M:NOUN:POSS), life -> lives (R:NOUN:NUM)