You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have an idea that might speed up scrump when its parameter pre_scrump is set to True. (After some thought, I concluded that this reduction in computing time would be probably insignificant. However, I am going to share my idea here anyway)
When pre_scrump=True, we calculate an approximate matrix profile by going through some samples (indices):
So, at the end of prescrump process, we already calculated roughly $\frac{1}{s} l^{2} + 2l$ elements of distance matrix. I think if we assume l is large, we can then say:
we need to calculate $\frac{1}{2} l^{2}$ cells to obtain the full distance matrix (in self-join)
At the end of prescrump, we already calculated $\frac{1}{s} l^{2}$ cells of them.
So, if we avoid re-calculating $\frac{1}{s} l^{2}$ number of distances throughtout .update() process, then:
So, if l = 100_000 and s = 100 (0.001 of l), we are reducing the computation time by 2% (?!). This is not just for a single update but rather for several calls of update that complete the distance matrix. So, I think the improvement is probably insignificant for a single call of update method.
I am not sure about my rough calculation though :)
I think keeping track of what pair-wise distances have been calculated can also help us avoid the duplicate issue that might appear in the top-k matrix profiles as discussed in PR #595.
The text was updated successfully, but these errors were encountered:
I have an idea that might speed up
scrump
when its parameterpre_scrump
is set to True. (After some thought, I concluded that this reduction in computing time would be probably insignificant. However, I am going to share my idea here anyway)When
pre_scrump=True
, we calculate an approximate matrix profile by going through some samples (indices):stumpy/stumpy/scrump.py
Line 346 in c432db4
(note:
l = n - m + 1
)And, for each sample
i
:S_(i + j)
andS_(nn_i + j)
forj in range(1, s)
S_(i - j)
andS_(nn_i - j)
forj in range(1, s)
So, of
Combination(l, 2)
cells of distance matrix (in self-join), we roughly calculate:So, at the end of$\frac{1}{s} l^{2} + 2l$ elements of distance matrix. I think if we assume
prescrump
process, we already calculated roughlyl
is large, we can then say:prescrump
, we already calculatedSo, if we avoid re-calculating$\frac{1}{s} l^{2}$ number of distances throughtout
.update()
process, then:So, if
l = 100_000
ands = 100
(0.001 ofl
), we are reducing the computation time by 2% (?!). This is not just for a single update but rather for several calls of update that complete the distance matrix. So, I think the improvement is probably insignificant for a single call of update method.I am not sure about my rough calculation though :)
I think keeping track of what pair-wise distances have been calculated can also help us avoid the duplicate issue that might appear in the top-k matrix profiles as discussed in PR #595.
The text was updated successfully, but these errors were encountered: