Skip to content

Releases: anhaidgroup/py_stringmatching

v0.4.5 - 2/1/2024

01 Feb 21:02
151f129
Compare
Choose a tag to compare
  • Discontinued usage of cythonize.py during setup due to Python 3.12 compatibility issues

Contributors:

Anson Doan, AnHai Doan

v0.4.4 - 1/26/2024

26 Jan 23:34
a9d2a1b
Compare
Choose a tag to compare
  • Dropped support for Python 2
  • Added support for Python 3.12
  • Adjusted setuptools.setup project name to match name on PyPI

Contributors:

Anson Doan, AnHai Doan, Zachary Ware

v0.4.3 - 2/8/2023

08 Feb 23:32
6791a79
Compare
Choose a tag to compare
  • Dropped support for Python 3.6.
  • Added support for Python 3.10 and 3.11.
  • Replaced aliases removed from Numpy 1.24.
  • Switched from Nose to vanilla Unittest.
  • Replaced Travis and Appveyor CI testing with Github Actions.

Contributors:

Anson Doan, AnHai Doan

v0.4.2 - 10/23/2020

26 Oct 20:22
74238ae
Compare
Choose a tag to compare
  • Bug fix: Made PartialRatio importable from py_stringmatching.
  • Dropped support for Python 3.4.
  • This is the last version of py_stringmatching that will support Python 2 and Python 3.5.

Contributors:

Matt Christie, Ruijian Wang, AnHai Doan

v0.4.1 - 02/22/19

28 Mar 17:32
Compare
Choose a tag to compare

v0.4.1 - 02/22/19

  • Cython version was updated. The package is now built with updated Cython version >= 0.27.3.
  • Added support for Python 3.7 version and dropped Testing support for Python 3.3 version.

Contributers:

Phil Martinkus, Matthew Christie, Chakshu Ahuja, AnHai Doan

v0.4.0

18 Jul 23:36
Compare
Choose a tag to compare

v0.4.0 - 07/18/2017

  • Five similarity measures written in Python have been Cythonized to run much faster. These are Affine, Jaro, Jaro Winkler, Needleman Wunsch, and Smith Waterman.

  • We have also empirically evaluated the runtime of Jaccard (written in Python) and found that it is already very fast. Thus, Cythonizing it is unlikely to yield much of a speedup.

  • Note that in Version 0.3.x (and earlier versions), edit distance has been Cythonized. Thus, the set of all Cythonized similarity measures consists of edit distance, Affine, Jaro, Jaro Winkler, Needleman Wunsch, and Smith Waterman.

  • In subsequent versions, it would be highly desirable to Cythonize remaining similarity measures, including Dice, cosine, etc.

  • For this package, we add a runtime benchmark (consisting of a script and several datasets) to measure the runtime performance of similarity measures. This benchmark can be used by users to judge whether similarity measures are fast enough for their purposes, and used by developers to speed up the measures.

Contributors:
Srujith Poondla, Phil Martinkus, Pradap Konda, Paul Suganthan G.C., AnHai Doan

v0.3.0

01 Jun 01:14
Compare
Choose a tag to compare

v0.3.0 - 05/29/2017

  • Added nine new string similarity measures - Bag Distance, Editex, Generalized Jaccard, Partial Ratio, Partial Token Sort, Ratio, Soundex, Token Sort, and Tversky Index.

Contributors:
Rishab Kalra, Pradap Konda, Paul Suganthan G.C., AnHai Doan

v0.2.1

24 May 16:53
Compare
Choose a tag to compare

v0.2.1 - 08/05/2016

  • Remove explicit installation of numpy using pip in setup.
  • Add numpy in setup_requires and compile extensions by including numpy install path.

Contributors:
Pradap Konda, Paul Suganthan G.C., AnHai Doan

v0.2.0

06 Jul 21:51
Compare
Choose a tag to compare

v0.2.0 - 07/06/2016

  • Qgram tokenizers have been modified to take a flag called "padding". If this flag is True (the default), then a prefix and a suffix will be added to the input string before tokenizing (see the Tutorial for a reason for this).
  • Version 0.1.0 does not handle strings in unicode correctly. Specifically, if an input string contains non-ascii characters, a string similarity measure may interpret the string incorrectly and thus compute an incorrect similarity score. In this version we have fixed the string similarity measures. Specifically, we convert the input strings into unicode before computing similarity measures. NOTE: the tokenizers are still not yet unicode-aware.
  • In Version 0.1.0, the flag "dampen" for TF/IDF similarity measure has the default value of False. In this version we have modified it to have the default value of True, which is the more common value for this flag in practice.

Contributors:
Pradap Konda, Paul Suganthan G.C., AnHai Doan

v0.1.0 (first py_stringmatching release)

14 Jun 22:43
Compare
Choose a tag to compare

v0.1.0 - 06/14/2016

  • Initial release.
  • Contains 5 tokenizers: Alphabetic tokenizer, Alphanumeric tokenizer, Delimiter tokenizer, Qgram tokenizer, and
    Whitespace tokenizer.
  • Contains 14 similarity measures: Affine, Cosine, Dice, Hamming distance, Jaccard, Jaro, Jaro-Winkler,
    Levenshtein, Monge-Elkan, Needleman-Wunsch, Overlap coefficient, Smith-Waterman, Soft TF/IDF, and TF/IDF.

Contributors:
Pradap Konda, Paul Suganthan G.C., Ali Hitawala, Vaidhyanathan Venkiteswaran, AnHai Doan