Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can training data is 3-D form? #7

Open
lynnwong11 opened this issue Sep 22, 2017 · 5 comments
Open

can training data is 3-D form? #7

lynnwong11 opened this issue Sep 22, 2017 · 5 comments

Comments

@lynnwong11
Copy link

I am new to Dynamic Time Warping and your note helps quite a lot. Thank you for your sharing. My input
data is 3-D, having shape of (n_samples, n_timesteps,n_features). I am not sure how to transfer it into using the model. Thank you so much!

@lynnwong11
Copy link
Author

@markdregan

@markdregan
Copy link
Owner

markdregan commented Sep 23, 2017 via email

@lynnwong11
Copy link
Author

@markdregan thank you so much~
I searched the internet and find there is a python module : https://github.com/pierre-rouanet/dtw
In it's code, it says:
:param array x: N1M array
:param array y: N2
M array

It deals with n_features by using dist:
dist, cost, acc, path = dtw(x, y, dist=lambda x, y: norm(x - y, ord=1))

Is it the normal way of handling multi-features or other way?

@markdregan
Copy link
Owner

In my repo, you would need to update the function def _dist_matrix(self, x, y): so that x and y are of shape num_features, num_samples, num_timesteps. The code in the code in the function would need to be updated to iterate through the features and output a distance matrix. The function should return dm of the shape num_features, num_x_samples, num_y_samples.

The def predict(self, x): function would need to be updated too. Depending on how you want to factor in the dtw distance per feature - implementation will be slightly different.

Or, you could use my code as if. Iterate through your dataset per feature - saving the distance matrix for each comparison of x and y. Then write your own method to do arg_sort across the feature distance matrices.

@Apathyman
Copy link

From Dr. Keogh and Dr. Mueen's talk on DTW last year:
There are two main ways to use DTW to find a multi-dimensional distance.

  1. Run DTW on each dimension and add up the distances.
  2. Find the distance of each time step (total multi-dimensional distance) and find a single warping path

The main difference is how tightly coupled you think the dimensions are (IE: how much is the position of one likely to reflect the position of the other). If you are sure the data is tightly coupled, method 2 is slightly better. However, as soon as random lags appear in your data you'll see the independent method (1) far outperform method 2.

He goes on to share some interesting, but not really relevant here, notes on how to keep dimensionality low (usually <5) and choose the best dimensions. The most important part there is that, when using DTW, using too many dimensions is less accurate than using even one random dimension alone.

In short,

  1. Consider picking a few features out of your set to reduce error.
  2. Unless you're really confident, assuming your total distance is the sum distance of each feature is fine.

While it might be useful to bake this in as a convinience feature, it depends quite a bit on use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants