draft: Cythonize #33

nucccc · 2023-02-06T19:25:35Z

Good evening,
I provide in here a folder with an example of cython code and a benchmark, so that if you want to think about it you can find some material from which to start.

Inside the simylaritymeasures folder, there will be a cy_simylaritymeasures, in which i left only the cython code, scripts and quick documentation in readme.md to build it and run it.

Hoping it can turn out to be of any utility,
Nuc

cjekel · 2023-02-07T19:36:30Z

Very nice! It's going to be until the weekend until I can comment in detail.

I see this is a very minimal cython port. That's great!

I'm curious what happens if you drop

c = distance.cdist(exp_data, num_data, metric='minkowski', p=p)

and use cython to create this matrix? It might be slightly faster to calculate the distance when we need it, rather than creating the large matrix?

Also, we may want to show performance improvements with respect to both the number of data, and the number of dimensions in the data.

nucccc · 2023-02-08T08:43:09Z

After your suggestion, I even modified the cython code for a hypothetized improvement by eliminating not the distances matrix
c = distance.cdist(exp_data, num_data, metric='minkowski', p=p)
but rather the ca matrix of the coefficients. That didn't look to me as a great speed improvement, but may remove from ram an additional matrix. All calculations are done inplace on the original distance matrix c, thus modifying it. That one can be seem as a procedural change, and I still to reflect on that, but may lead to a decrease in RAM occupation. I hope it to be possibly useful.

I forgot to mention that as a case I'm actually working just on the frechet distance, but of course the cythonization can be then extended on all the other functions.

Thank you for the attention and the work,
Nuc

cjekel · 2023-02-14T20:36:49Z

Just a bump. I'm still very interested in this comparison, I've just been swamped lately!

cjekel · 2023-02-22T17:14:47Z

Nice bump adding dtw! I need to test this out asap.

nucccc · 2023-02-22T17:26:06Z

Ah yes, i just added a dtw comparison. Then i think the benchmark could be better organized, I would put some more work in it.

I'm swamped too and thank you for the attention!

Also if you ever need a quick voice discussion just let me know and we can keep in touch via google meet or something at a point, I'm based in Italy and right now I'm flexible with my schedule.

cjekel · 2023-10-07T18:44:06Z

I seem to be getting about a 40% improvement gain, that is not sensitive to the total amount of data. This is good! I think I should consider porting the entire library to cython. It would be nice to have both a cython and python version side by side, since the python version is slightly easier to read. Not sure what is the best way to have both...

1.0000510139013212
1.0000510139013212
True
[0.981130201, 1.0351228799999999, 1.0592268329999999, 1.0405540450000004, 1.066770408]
[0.594745455, 0.5694208449999998, 0.5682200880000003, 0.5850636710000003, 0.5942479000000009]
average execution time for non cythonized version: 1.036561
average execution time for cythonized version: 0.582340
improvement: 43 %
100.00171783203773
100.00171783203773
True
[0.9917487250000008, 1.0106956060000005, 0.984660250000001, 1.0517639240000012, 0.9957187320000003]
[0.6305159309999997, 0.6164782420000012, 0.6147694729999991, 0.6011433679999989, 0.6316231439999989]
average execution time for non cythonized version: 1.006917
average execution time for cythonized version: 0.618906
improvement: 38 %

nucccc · 2023-10-16T20:24:50Z

Hi, in this period I could be a little bit busy, but I could find some time. Recently I red that Cython 3.0 was released, which uses heavily type hinting, and one could resort to type hints such as cython.int or cython.long to actually have types, I would give a look if one could could write pythonic code and then let Cython 3.0 transpile it with that performance gain.

nucccc added 3 commits February 5, 2023 19:16

cython code for frechet dist - simple build commands

28cad45

benchmark now using timeit

220b608

wrote better readme

7ac2b60

removed intermediate ca matrix from frechet distance

d5f5bea

added dtw cythonized

a6e1edc

nucccc changed the title ~~Cythonize~~ draft: Cythonize Oct 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

draft: Cythonize #33

draft: Cythonize #33

nucccc commented Feb 6, 2023

cjekel commented Feb 7, 2023 •

edited

nucccc commented Feb 8, 2023

cjekel commented Feb 14, 2023

cjekel commented Feb 22, 2023

nucccc commented Feb 22, 2023

cjekel commented Oct 7, 2023

nucccc commented Oct 16, 2023

draft: Cythonize #33

Are you sure you want to change the base?

draft: Cythonize #33

Conversation

nucccc commented Feb 6, 2023

cjekel commented Feb 7, 2023 • edited

nucccc commented Feb 8, 2023

cjekel commented Feb 14, 2023

cjekel commented Feb 22, 2023

nucccc commented Feb 22, 2023

cjekel commented Oct 7, 2023

nucccc commented Oct 16, 2023

cjekel commented Feb 7, 2023 •

edited