Accurate benchmarks? #36

ekreutz · 2019-06-19T09:38:36Z

Hey, I ran a quick benchmark of my own on, using:

macOS 10.14.5 (Mojave)
Python 3.7.3
python-Levenshtein 0.12.0 (pypi)
editdistance 0.5.3 (pypi)

In my tests python-Levenshtein is about 10x faster. Perhaps it's the macOS binaries? Or maybe your tests are outdated?

import editdistance
from Levenshtein._levenshtein import *
from timeit import timeit

s00 = "akjsdjkahsdhjkashd"
s01 = "akjsdjkahsdhj"
s10 = 'xyzz'
s11 = 'xab'
s20 = 'aaaaaaaaaaaaaaaaaaaaaaa'
s21 = 'i'

def a1():
    editdistance.eval(s00, s01)
    editdistance.eval(s10, s11)
    editdistance.eval(s20, s21)
def a2():
    distance(s00, s01)
    distance(s10, s11)
    distance(s20, s21)

print("editdistance")
print(timeit(a1, number=100000))

print("\npython-Levenshtein")
print(timeit(a2, number=100000))

Prints:

editdistance
0.330241583

python-Levenshtein
0.03681695899999998

The text was updated successfully, but these errors were encountered:

desialex · 2019-07-21T23:22:54Z

I just compared those two in a real-life application and editdistance is about 30% faster.

maxbachmann · 2021-04-11T12:34:02Z

At least in my benchmarks this is largely dependent on the length of the input strings. Here is a comparision for different libraries using different string lengths. Both edlib and editdistance appear to have a lot of overhead for short strings.

Only python-Levenshtein uses a quadratic time implementation, while all others use Myers/Hyyrös bitparallel implementation.

dzieciou · 2021-07-25T13:26:28Z

@maxbachmann Great chart. It shows the choice of implementation really depends on the application.

One application would be finding the closest single words, e.g., in spelling correction.
The other one would be measuring edit distance for long string sequences, e.g., in comparing the layout of documents as in "Comparison and Classification of Documents Based on Layout Similarity". I regret I haven't done a similar evaluation as yours at the time I was implementing document clustering...

For the latter rapidfuzz seems like a good choice.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accurate benchmarks? #36

Accurate benchmarks? #36

ekreutz commented Jun 19, 2019

desialex commented Jul 21, 2019

maxbachmann commented Apr 11, 2021 •

edited

dzieciou commented Jul 25, 2021 •

edited

Accurate benchmarks? #36

Accurate benchmarks? #36

Comments

ekreutz commented Jun 19, 2019

desialex commented Jul 21, 2019

maxbachmann commented Apr 11, 2021 • edited

dzieciou commented Jul 25, 2021 • edited

maxbachmann commented Apr 11, 2021 •

edited

dzieciou commented Jul 25, 2021 •

edited