Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is the discrepancy between two identical time-series negative? #10

Open
karhohs opened this issue Apr 20, 2018 · 6 comments
Open

Why is the discrepancy between two identical time-series negative? #10

karhohs opened this issue Apr 20, 2018 · 6 comments

Comments

@karhohs
Copy link

karhohs commented Apr 20, 2018

Hello,

I wanted to build up some intuition about how the soft-dtw cost function works by playing around with some fake data. I started comparing scaled and shifted cosines and wanted to see what numbers soft-dtw would spit out. However, when I compared a cosine to itself I couldn't understand why I was getting a negative result. Wouldn't the cumulative alignment cost for signal to itself be zero? Could someone please help me understand why the result is negative?

Also, when I compared similarly shifted cosines, but one was scaled by 0.5 and the other scaled by 2.0, the discrepancies were very different from each other. Why does scaling have such a strong effect on the scoring? Should I be self-normalizing my data before using soft-dtw?

I tried the following:

import numpy as np
from sdtw import SoftDTW
from sdtw.distance import SquaredEuclidean

t = np.arange(100)
X = np.cos(t).reshape(-1, 1)
Y = 0.5*np.cos(t + 1).reshape(-1, 1)
Z = 2.0*np.cos(t + 1).reshape(-1, 1)

Dxx = SquaredEuclidean(X, X)
Dxy = SquaredEuclidean(X, Y)
Dxz = SquaredEuclidean(X, Z)

sdtw_xx = SoftDTW(Dxx, gamma=1.0)
sdtw_xy = SoftDTW(Dxy, gamma=1.0)
sdtw_xz = SoftDTW(Dxz, gamma=1.0)

print(sdtw_xx.compute())
print(sdtw_xy.compute())
print(sdtw_xz.compute())

I got the following output:

-104.37094339998296
-106.85481189044374
-17.672483911503427
@mblondel
Copy link
Owner

Like DTW, SDTW is not a distance. And as you noticed unlike DTW, SDTW(x, x) != 0. So we prefer to call SDTW a discrepancy.

The reason it's not zero is because SDTW considers all possible alignments weighted by their probability under the Gibbs distribution (see paper) while DTW considers only the best one.

@marcocuturi
Copy link

As is typically done for the GAKernel (http://marcocuturi.net/GA.html) one might want to normalize SDTW to give it that property SDTW_(x,y) := SDTW(x,y) - 1/2(SDTW(x,x)+SDTW(y,y)).

@karhohs
Copy link
Author

karhohs commented Apr 23, 2018

@mblondel thank you for your insights. I looked at your paper and see that the discrepancy is the weighted sum of all possible alignments, which I believe is equation 1 in your paper. From the above cosine example, I observed that the discrepancy between a cosine and itself is greater than the discrepancy between a cosine and a slightly-shifted, scaled by 0.5 version of itself. Does this imply that alignments that account for scale are more penalizing than alignments that account for shifts? I believe I will self-normalize each of my time-series to their mean to help mitigate this bias. As @mblondel states in his paper, "Unlike the Euclidean distance, DTW can compare time series of variable size and is robust to shifts or dilatations across the time dimension." Perhaps DTW is not robust to scaling and amplitude changes of a time-series. Am I reading this all correctly?

Also, thanks for pointing out your normalization @marcocuturi. Can the normalization be performed before computing the SDTW discrepancy? Can I use the same equation to normalize the gradient of the alignment matrix if I wanted to use the normalized discrepancy as a fitting term?

FWIW, I used @marcocuturi suggestion and applied it to the cosines defined above. The results now resemble a distance metric:

t = np.arange(100)
X = np.cos(t).reshape(-1, 1)
Y = 0.5*np.cos(t + 1).reshape(-1, 1)
Z = 2.0*np.cos(t + 1).reshape(-1, 1)

Dxx = SquaredEuclidean(X, X)
Dxy = SquaredEuclidean(X, Y)
Dyy = SquaredEuclidean(Y, Y)
Dxz = SquaredEuclidean(X, Z)
Dzz = SquaredEuclidean(Z, Z)

sdtw_xx = SoftDTW(Dxx, gamma=1.0)
sdtw_xy = SoftDTW(Dxy, gamma=1.0)
sdtw_yy = SoftDTW(Dyy, gamma=1.0)
sdtw_xz = SoftDTW(Dxz, gamma=1.0)
sdtw_zz = SoftDTW(Dzz, gamma=1.0)

print(sdtw_xx.compute()-0.5*(sdtw_xx.compute()+sdtw_xx.compute()))
print(sdtw_xy.compute()-0.5*(sdtw_xx.compute()+sdtw_yy.compute()))
print(sdtw_xz.compute()-0.5*(sdtw_xx.compute()+sdtw_zz.compute()))

Here's the output:

0.0
16.233598135265396
57.79562607598451

Furthermore, the effect of scaling is more plainly seen after this normalization. If I define the Z cosine as 1.5*np.cos(t + 1).reshape(-1, 1) instead of 2.0*np.cos(t + 1).reshape(-1, 1), then the normalized discrepancy is 15.768949868955872 and very close to the 0.5 scaled cosine.

@karhohs
Copy link
Author

karhohs commented Jun 21, 2018

I've been trying to cluster time-series data using soft-dtw and hierarchical clustering, but have been having a difficult time defining clusters that are obvious by eye. @marcocuturi 's normalization equation has helped, except that a dissimilarity metric seems to confound my clustering efforts since the very dissimilar scores appear to dominate the clustering. Towards this end I rescaled the normalized dissimilarity metric between (0,1] with the equation exp(-x). This appears to help, but I'm still falling short of expectations. What do you all think about this approach? Am I missing something obvious? I'm happy to share a gist of my jupyter notebook and some test data if this is helpful. Thanks!

@mblondel
Copy link
Owner

I am curious. Do you observe the same behavior with unregularized / exact DTW? Hierarchical clustering doesn't need gradients so you could try it too.

SDTW really shines when using it for k-means clustering, since the centroids / barycenters can be learned. I haven't added k-means to the repository yet, though.

Another option is to use the soft-DTW as a kernel and plug it into kernel k-means.

The kernel is given by sdtw_kernel(x, y; gamma) := exp(-sdtw(x, y; gamma)) .

@mblondel
Copy link
Owner

I believe this project supports SDTW-based k-means clustering:

https://github.com/rtavenar/tslearn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants