Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pairwise distances: first implementation #2

Merged
merged 29 commits into from
May 19, 2021
Merged

Pairwise distances: first implementation #2

merged 29 commits into from
May 19, 2021

Conversation

mbatoul
Copy link
Owner

@mbatoul mbatoul commented May 17, 2021

No description provided.

@mbatoul mbatoul changed the title Implementation pairwise distances computation in Cython Pairwise distances implementation May 18, 2021
@mbatoul mbatoul changed the title Pairwise distances implementation Pairwise distances: first implementation May 18, 2021
Copy link
Collaborator

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a nice first start! 👏

The generated .c file is still present. You might want to erase it from your local repo cache using something like:

git rm --cached **/*.c

I left some comment, but in overall it's good: the Python API is concise and the implementation is done in cdef functions which should map to C code with no much interaction with the CPython interpreter.

Looking forward to your future iterations. :)

.github/workflows/benchmarks.yml Show resolved Hide resolved
benchmarks.ipynb Outdated Show resolved Hide resolved
benchmarks/__main__.py Show resolved Hide resolved
benchmarks/core.py Outdated Show resolved Hide resolved
benchmarks/core.py Outdated Show resolved Hide resolved
setup.py Show resolved Hide resolved
src/pairwise_dist/_pairwise_dist.pyx Outdated Show resolved Hide resolved
src/pairwise_dist/_pairwise_dist.pyx Outdated Show resolved Hide resolved
src/pairwise_dist/_pairwise_dist.pyx Outdated Show resolved Hide resolved
src/pairwise_dist/_pairwise_dist.pyx Show resolved Hide resolved
Copy link
Collaborator

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link for the benchmarks returns a 404.

The implementation looks good. You can now compare with pairwise_distances(metric="l1") from sklearn on datasets with different sizes (on single thread and then on multi-thread).

src/pairwise_dist/_pairwise_dist.pyx Outdated Show resolved Hide resolved
src/pairwise_dist/_pairwise_dist.pyx Outdated Show resolved Hide resolved

for i in prange(n_rows_X_a, nogil=True):
for j in range(n_rows_X_b):
_compute_dist(X_a[i], X_b[j], n_features, &distances[i, j])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this implementation passing a pointer for distances[i, j] seems unecessary. You could just do
distances[i, j] = _compute(dist(X_a[i], X_b[j], n_features))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See this issue scikit-learn/scikit-learn#17299. It looks like a good reproducer for it. Is it faster to do as explained in the issue ?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems faster indeed! Some quick benchmarks:

image

We will see how it improves performance compared to scipy and scikit-learn.

src/pairwise_dist/_pairwise_dist.pyx Show resolved Hide resolved
@mbatoul mbatoul marked this pull request as ready for review May 18, 2021 23:36
Copy link
Collaborator

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are good with this PR!

src/pairwise_dist/_pairwise_dist.pyx Show resolved Hide resolved
@mbatoul mbatoul merged commit aa5182a into master May 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants