Speeding up `scrump` #643

NimaSarajpoor · 2022-07-15T21:34:44Z

I have an idea that might speed up scrump when its parameter pre_scrump is set to True. (After some thought, I concluded that this reduction in computing time would be probably insignificant. However, I am going to share my idea here anyway)

When pre_scrump=True, we calculate an approximate matrix profile by going through some samples (indices):

stumpy/stumpy/scrump.py

Line 346 in c432db4

indices = np.random.permutation(range(0, l, s)).astype(np.int64)

(note: l = n - m + 1)

And, for each sample i:

we calculate its distance profile.
we also use QT to calculate the distance between S_(i + j) and S_(nn_i + j) for j in range(1, s)
we also use QT to calculate the distance between S_(i - j) and S_(nn_i - j) for j in range(1, s)

So, of Combination(l, 2) cells of distance matrix (in self-join), we roughly calculate:

$\frac{l}{s} \times l$ (regarding distance profile)
$\frac{l}{s} \times s$ (regarding move forward)
$\frac{l}{s} \times s$ (regarding move backward)

So, at the end of prescrump process, we already calculated roughly $\frac{1}{s} l^{2} + 2l$ elements of distance matrix. I think if we assume l is large, we can then say:

we need to calculate $\frac{1}{2} l^{2}$ cells to obtain the full distance matrix (in self-join)
At the end of prescrump, we already calculated $\frac{1}{s} l^{2}$ cells of them.

So, if we avoid re-calculating $\frac{1}{s} l^{2}$ number of distances throughtout .update() process, then:

$$ \frac{\frac{1}{s} l^{2}}{\frac{1}{2} l^{2}} = \frac{2}{s}$$

So, if l = 100_000 and s = 100 (0.001 of l), we are reducing the computation time by 2% (?!). This is not just for a single update but rather for several calls of update that complete the distance matrix. So, I think the improvement is probably insignificant for a single call of update method.

I am not sure about my rough calculation though :)

I think keeping track of what pair-wise distances have been calculated can also help us avoid the duplicate issue that might appear in the top-k matrix profiles as discussed in PR #595.

The text was updated successfully, but these errors were encountered:

NimaSarajpoor mentioned this issue Jul 15, 2022

[WIP] Add Top-K Nearest Neighbors for Normalized Matrix Profile #592 #595

Merged

6 tasks

seanlaw added enhancement New feature or request question Further information is requested labels Jul 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speeding up `scrump` #643

Speeding up `scrump` #643

NimaSarajpoor commented Jul 15, 2022 •

edited

Speeding up scrump #643

Speeding up scrump #643

Comments

NimaSarajpoor commented Jul 15, 2022 • edited

Speeding up `scrump` #643

Speeding up `scrump` #643

NimaSarajpoor commented Jul 15, 2022 •

edited