Releases · anhaidgroup/py_stringmatching

Bug fix: Made PartialRatio importable from py_stringmatching.
Dropped support for Python 3.4.
This is the last version of py_stringmatching that will support Python 2 and Python 3.5.

Contributors:

Matt Christie, Ruijian Wang, AnHai Doan

Assets 3

28 Mar 17:32

pjmartinkus

v0.4.1

4f52a22

v0.4.1 - 02/22/19

v0.4.1 - 02/22/19

Cython version was updated. The package is now built with updated Cython version >= 0.27.3.
Added support for Python 3.7 version and dropped Testing support for Python 3.3 version.

Contributers:

Phil Martinkus, Matthew Christie, Chakshu Ahuja, AnHai Doan

Assets 4

18 Jul 23:36

paulgc

v0.4.0

4793e60

v0.4.0

v0.4.0 - 07/18/2017

Five similarity measures written in Python have been Cythonized to run much faster. These are Affine, Jaro, Jaro Winkler, Needleman Wunsch, and Smith Waterman.
We have also empirically evaluated the runtime of Jaccard (written in Python) and found that it is already very fast. Thus, Cythonizing it is unlikely to yield much of a speedup.
Note that in Version 0.3.x (and earlier versions), edit distance has been Cythonized. Thus, the set of all Cythonized similarity measures consists of edit distance, Affine, Jaro, Jaro Winkler, Needleman Wunsch, and Smith Waterman.
In subsequent versions, it would be highly desirable to Cythonize remaining similarity measures, including Dice, cosine, etc.
For this package, we add a runtime benchmark (consisting of a script and several datasets) to measure the runtime performance of similarity measures. This benchmark can be used by users to judge whether similarity measures are fast enough for their purposes, and used by developers to speed up the measures.

Contributors:
Srujith Poondla, Phil Martinkus, Pradap Konda, Paul Suganthan G.C., AnHai Doan

Assets 4

01 Jun 01:14

paulgc

v0.3.0

1b0e2e1

v0.3.0

v0.3.0 - 05/29/2017

Added nine new string similarity measures - Bag Distance, Editex, Generalized Jaccard, Partial Ratio, Partial Token Sort, Ratio, Soundex, Token Sort, and Tversky Index.

Contributors:
Rishab Kalra, Pradap Konda, Paul Suganthan G.C., AnHai Doan

Assets 4

24 May 16:53

paulgc

v0.2.1

97592ae

v0.2.1

v0.2.1 - 08/05/2016

Remove explicit installation of numpy using pip in setup.
Add numpy in setup_requires and compile extensions by including numpy install path.

Contributors:
Pradap Konda, Paul Suganthan G.C., AnHai Doan

Assets 2

06 Jul 21:51

paulgc

v0.2.0

4ee68eb

v0.2.0

v0.2.0 - 07/06/2016

Qgram tokenizers have been modified to take a flag called "padding". If this flag is True (the default), then a prefix and a suffix will be added to the input string before tokenizing (see the Tutorial for a reason for this).
Version 0.1.0 does not handle strings in unicode correctly. Specifically, if an input string contains non-ascii characters, a string similarity measure may interpret the string incorrectly and thus compute an incorrect similarity score. In this version we have fixed the string similarity measures. Specifically, we convert the input strings into unicode before computing similarity measures. NOTE: the tokenizers are still not yet unicode-aware.
In Version 0.1.0, the flag "dampen" for TF/IDF similarity measure has the default value of False. In this version we have modified it to have the default value of True, which is the more common value for this flag in practice.

Contributors:
Pradap Konda, Paul Suganthan G.C., AnHai Doan

Assets 4

14 Jun 22:43

anhaidgroup

v0.1.0

c6237e7

v0.1.0 (first py_stringmatching release)

v0.1.0 - 06/14/2016

Initial release.
Contains 5 tokenizers: Alphabetic tokenizer, Alphanumeric tokenizer, Delimiter tokenizer, Qgram tokenizer, and
Whitespace tokenizer.
Contains 14 similarity measures: Affine, Cosine, Dice, Hamming distance, Jaccard, Jaro, Jaro-Winkler,
Levenshtein, Monge-Elkan, Needleman-Wunsch, Overlap coefficient, Smith-Waterman, Soft TF/IDF, and TF/IDF.

Contributors:
Pradap Konda, Paul Suganthan G.C., Ali Hitawala, Vaidhyanathan Venkiteswaran, AnHai Doan

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: anhaidgroup/py_stringmatching

v0.4.5 - 2/1/2024

v0.4.4 - 1/26/2024

v0.4.3 - 2/8/2023

v0.4.2 - 10/23/2020

v0.4.1 - 02/22/19

v0.4.0

v0.3.0

v0.2.1

v0.2.0

v0.1.0 (first py_stringmatching release)