Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: improve alignment::distance::levenshtein and alignment::distance::bounded_levenshtein on strrings where distance is small #522

Merged
merged 8 commits into from Jun 14, 2023

Conversation

nkkarpov
Copy link
Contributor

@nkkarpov nkkarpov commented May 27, 2023

Probably, it is better to use the implementation of the Ukkonen algorithm from editdistancek.
Before:

test bench_levenshtein_dist_diverse_str              ... bench: 113,214,418 ns/iter (+/- 781,509)
test bench_levenshtein_dist_equal_str                ... bench:  91,783,885 ns/iter (+/- 183,007)
test bench_levenshtein_dist_worst_str                ... bench:  91,866,787 ns/iter (+/- 1,736,995)
test bench_simd_bounded_levenshtein_dist_diverse_str ... bench:  41,695,427 ns/iter (+/- 1,917,682)
test bench_simd_bounded_levenshtein_dist_equal_str   ... bench:  41,640,232 ns/iter (+/- 2,047,975)
test bench_simd_bounded_levenshtein_dist_worst_str   ... bench:  41,474,403 ns/iter (+/- 3,183,653)
test bench_simd_levenshtein_dist_diverse_str         ... bench:  54,980,802 ns/iter (+/- 2,732,004)
test bench_simd_levenshtein_dist_equal_str           ... bench:      75,170 ns/iter (+/- 1,705)
test bench_simd_levenshtein_dist_worst_str           ... bench: 100,214,002 ns/iter (+/- 21,694,930)

After:

test bench_levenshtein_dist_diverse_str              ... bench:  39,735,136 ns/iter (+/- 194,398)
test bench_levenshtein_dist_equal_str                ... bench:       6,711 ns/iter (+/- 324)
test bench_levenshtein_dist_worst_str                ... bench: 137,810,484 ns/iter (+/- 340,456)
test bench_simd_bounded_levenshtein_dist_diverse_str ... bench:  39,740,574 ns/iter (+/- 159,197)
test bench_simd_bounded_levenshtein_dist_equal_str   ... bench:       6,572 ns/iter (+/- 511)
test bench_simd_bounded_levenshtein_dist_worst_str   ... bench: 137,862,613 ns/iter (+/- 724,980)
test bench_simd_levenshtein_dist_diverse_str         ... bench:  54,768,509 ns/iter (+/- 2,092,364)
test bench_simd_levenshtein_dist_equal_str           ... bench:      75,626 ns/iter (+/- 3,295)
test bench_simd_levenshtein_dist_worst_str           ... bench:  96,541,484 ns/iter (+/- 2,901,857)

The implementation works in time O(nd) where d is the edit distance between strings (+simd optimization). The implementation outperforms the previous version except when the edit distance is close to the maximum possible value. You can check more benchmarks at https://github.com/nkkarpov/editdistancek . On long random strings (1k) with random edits the version from editdistancek can achieve a 24x speedup compared to triple_accell when the percent of edits is less than 1. When the percent of edits is at most 20, and the upper bound is equal to 2 times the real distance, the new implementation has at least a 2.5x speedup. The algorithm is threshold oblivious, and the upper bound on the distance is used only for memory allocation (the algorithm requires O(k) auxiliary memory where k is the upper bound on the distance).

@nkkarpov nkkarpov changed the title improve alignment::distance::levenshtein and alignment::distance::bounded_levenshtein on strrings where distance is small perf: improve alignment::distance::levenshtein and alignment::distance::bounded_levenshtein on strrings where distance is small May 27, 2023
@coveralls
Copy link

coveralls commented May 27, 2023

Coverage Status

coverage: 81.001% (-0.07%) from 81.067% when pulling 0ac8365 on nkkarpov:master into 640caa0 on rust-bio:master.

@johanneskoester johanneskoester merged commit da7daea into rust-bio:master Jun 14, 2023
9 of 10 checks passed
@RagnarGrootKoerkamp
Copy link
Contributor

RagnarGrootKoerkamp commented Jun 29, 2023

This doesn't compile on macbook M1.
RagnarGrootKoerkamp/astar-pairwise-aligner#17
nkkarpov/editdistancek#2

Especially the SIMD instructions seem broken.

@nkkarpov
Copy link
Contributor Author

It should work now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants