Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up Distance classes #5029

Open
gf712 opened this issue May 14, 2020 · 3 comments
Open

Clean up Distance classes #5029

gf712 opened this issue May 14, 2020 · 3 comments

Comments

@gf712
Copy link
Member

gf712 commented May 14, 2020

The current design of Distance and its subclasses is not very efficient. Here are some issues:

  • Dynamic casts in the inner loops
  • We use Distance::distance(int32_t idx_a, int32_t idx_b) from inside the kernels but when we are not careful this can lead to a lot of cache misses. I would suggest computing the whole matrix in one pass using eigen/linalg, when possible. Or at least make Distance::compute(int32_t idx_a, int32_t idx_b) be very efficient, i.e. no checks/branching or dynamic casts
  • There is no method that returns just the upper triangle of the distance matrix. It would be useful to get the upper triangle in the same format as returned by scipy.spatial.distance.pdist, and then have the equivalent squareform function (probably as a static class function of Distance)

@karlnapf do you agree? There might be more points that should be added to this list.

@karlnapf
Copy link
Member

absolutely agree. These things are sometimes tricky to achieve in our messed up codebase ;)

I guess the design initially was for string kernels, which are very expensive to compute. This is why we had the KernelCache thing, which makes sense in that case. For things like GaussianKernel etc it makes way more sense to compute them using matrix-matrix products and batched element wise operations. So in some sense, these kernel should do a lazy precompute of the whole pairwise distance matrix, and then the compute calls just query that (with an iterator for appropriate order).
The distance design was pretty much just copied from that. IMO these two things should not even be distinct APIs tbh

@karlnapf
Copy link
Member

The pattern of batching operations with matrix operations where possible something that could be applied in many places in shogun actually.

@gf712
Copy link
Member Author

gf712 commented May 14, 2020

So in some sense, these kernel should do a lazy precompute of the whole pairwise distance matrix, and then the compute calls just query that (with an iterator for appropriate order).

Basically, you would create some lambda function that captures lhs and rhs addresses and then this is stored in an iterator that is sorted in way that is cache friendly and the lambda is executed when the iterator is dereferenced?

for (const auto& [lhs_index, rhs_index, d]: EuclideanDistance(lhs, rhs)) // d is computed as it is dereferenced here
   // another computation with d

The part with batches I am not sure how to do. I think for EuclideanDistance you would just fill the registers as much as possible by computing the norm of each vector of lhs and rhs, so you would just do lhs[idx].dot(lhs[idx].T). Btw another optimisation compare lhs and rhs first because quite often lhs == rhs, so you are computing the norm twice of the same thing, and the cpu runtime optimiser might not detect this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants