-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for cdist #175
Comments
Indeed, that specific setting was not considered when designing this toolbox. While it is a good suggestion, this would require quite some refactoring. You probably already thought of this, but our goto solution in case of prototypes would be to parallelize over the prototypes manually (see e.g. the k-means implementation). Two simpler alternatives could be to use either numpy concat or to make the toolbox think it sees one list. And then use the Assume two sets of series you want to compare:
First alternative (with the disadvantage that it requires copying all data to one data structure):
Second alternative:
The overhead of the second solution should not be too bad. The additional memory requirements are a lot less than storing the actual data. If this second solution works well, we could consider adding this with some wrappers to make it easy to use different series. But some benchmarking would be necessary first. |
Thank you for this extensive answer, adding a wrapper sounds like a good idea. I'm leaving the issue opened just in case, but feel free to close it if you want. Thank you for your help. |
How would one properly format a matrix of (lat, long) coordinates for the |
In this case you can probably use this functionality directly (assuming the series you want to use are consecutive): In case the series you want to compare are scattered in your numpy matrix, you need to reorder. Suppose you want to compare series 0,2,4 with 1,3,4, this would be something like:
Unrelated to DTW, using Euclidean distance (used within DTW) for lat/lon might give suboptimal results. For example, one degree latitude is about 111km, but on degree longitude ranges from being 111km to 0km depending where you are on earth. If you observe strange results, consider re-projecting to a two-dimensional plane (e.g. Mercator): https://geopandas.org/en/stable/docs/user_guide/projections.html |
Thanks for the advice! Just to confirm, would I have to reshape a my array of shape |
You should not reshape, but use the multi-dimensional version available in the |
Gotcha. Have done that, but was wondering if there was to have the matrix function work with 2d. All good. Thank you! |
Currently it is possible to compute a distance matrix between all pairs of time series within a collection with
dtaidistance.dtw_ndim.distance_matrix
, but it is not possible to (efficiently) compute distances between each pair of two collections of inputs, something akin toscipy.spatial.distance.cdist
but for DTW.This can be really useful to efficiently compute distances between multiple time series and a collection of prototypes.
The text was updated successfully, but these errors were encountered: