added the Boss-SP algorithm and its example. Tests are still missing. #156

SvenBarray · 2023-08-31T02:22:56Z

No description provided.

SvenBarray · 2023-08-31T02:37:41Z

Not a pull request meant to be accepted, but a pull request meant to share code and get feedback

johannfaouzi · 2023-08-31T08:54:19Z

pyts/classification/bosssp.py

+    level : integer (default = 3)
+         Number of times the series is being divided. Maximum of 3.


The order of the parameters is relevant: the parameters are usually sorted based on their relevance / importance. level is a key element of BOSSSP (compared to BOSS), so it should be higher on the list (above window_step I would say).

johannfaouzi · 2023-08-31T08:54:24Z

pyts/classification/bosssp.py

+                 anova=False, drop_sum=False, norm_mean=False, norm_std=False,
+                 strategy='quantile', alphabet=None,
+                 numerosity_reduction=True, use_idf=True, smooth_idf=False,
+                 sublinear_tf=True, level=3):


johannfaouzi · 2023-08-31T08:54:29Z

pyts/classification/bosssp.py

+        self.use_idf = use_idf
+        self.smooth_idf = smooth_idf
+        self.sublinear_tf = sublinear_tf
+        self.level = level


johannfaouzi · 2023-08-31T09:01:24Z

pyts/classification/bosssp.py

+                                                y[:length//2])
+            words_3 = self._boss_word_extractor(X[length//2:],
+                                                y[:length//2:])
+            words = words + words_2 + words_3


What are concatenated in BOSSSP are the histograms (and not the list of words). For instance, if level=3, the final histogram is the concatenation of 7 histograms. You need to first compute the histogram for each words_i list, then concatenate the histogram, instead of concatenating first the words_i lists and then computing the histogram.

johannfaouzi · 2023-08-31T09:02:16Z

pyts/classification/bosssp.py

+        else:
+            X_bow = np.asarray([' '.join(X_word[i]) for i in range(n_samples)])
+
+        words_list = np.asarray(X_bow[0].split(' ')).tolist()


You only get the transform for the first time series (X_bow[0]).

johannfaouzi · 2023-08-31T09:12:16Z

pyts/classification/bosssp.py

+    def _check_params(self, n_timestamps):
+        if not isinstance(self.word_size, (int, np.integer)):
+            raise TypeError("'word_size' must be an integer.")
+        if not self.word_size >= 1:
+            raise ValueError("'word_size' must be a positive integer.")
+
+        if not isinstance(self.window_size,
+                          (int, np.integer, float, np.floating)):
+            raise TypeError("'window_size' must be an integer or a float.")
+        if isinstance(self.window_size, (int, np.integer)):
+            if self.drop_sum:
+                if not 1 <= self.window_size <= (n_timestamps - 1):
+                    raise ValueError(
+                        "If 'window_size' is an integer, it must be greater "
+                        "than or equal to 1 and lower than or equal to "
+                        "(n_timestamps - 1) if 'drop_sum=True'."
+                    )
+            else:
+                if not 1 <= self.window_size <= n_timestamps:
+                    raise ValueError(
+                        "If 'window_size' is an integer, it must be greater "
+                        "than or equal to 1 and lower than or equal to "
+                        "n_timestamps if 'drop_sum=False'."
+                    )
+            window_size = self.window_size
+        else:
+            if not 0 < self.window_size <= 1:
+                raise ValueError(
+                    "If 'window_size' is a float, it must be greater "
+                    "than 0 and lower than or equal to 1."
+                )
+            window_size = ceil(self.window_size * n_timestamps)
+
+        if not isinstance(self.window_step,
+                          (int, np.integer, float, np.floating)):
+            raise TypeError("'window_step' must be an integer or a float.")
+        if isinstance(self.window_step, (int, np.integer)):
+            if not 1 <= self.window_step <= n_timestamps:
+                raise ValueError(
+                    "If 'window_step' is an integer, it must be greater "
+                    "than or equal to 1 and lower than or equal to "
+                    "n_timestamps."
+                )
+            window_step = self.window_step
+        else:
+            if not 0 < self.window_step <= 1:
+                raise ValueError(
+                    "If 'window_step' is a float, it must be greater "
+                    "than 0 and lower than or equal to 1."
+                )
+            window_step = ceil(self.window_step * n_timestamps)
+
+        if self.drop_sum:
+            if not self.word_size <= (window_size - 1):
+                raise ValueError(
+                    "'word_size' must be lower than or equal to "
+                    "(window_size - 1) if 'drop_sum=True'."
+                )
+        else:
+            if not self.word_size <= window_size:
+                raise ValueError(
+                    "'word_size' must be lower than or equal to "
+                    "window_size if 'drop_sum=False'."
+                )
+
+        if not isinstance(self.level, (int, np.integer)):
+            raise TypeError("'level' must be an integer.")
+        if not self.level >= 1:
+            raise ValueError("'level' must be a positive integer.")
+        if self.level >= 4:
+            raise ValueError("'level' must not exceed 3.")
+
+        return window_size, window_step


Likewise, it might be easier to use the corresponding of method of BOSS if the code is the same. You could just add the checks for level.

johannfaouzi · 2023-08-31T09:18:53Z

pyts/classification/bosssp.py

+            words_2 = self._boss_word_extractor(X[:length//2],
+                                                y[:length//2])
+            words_3 = self._boss_word_extractor(X[length//2:],
+                                                y[:length//2:])


You split your dataset in two, but on the first axis (the axis of samples) instead of the second axis (the time axis).

johannfaouzi · 2023-08-31T09:19:04Z

pyts/classification/bosssp.py

+            words_4 = self._boss_word_extractor(X[:length//4],
+                                                y[:length//4])
+            words_5 = self._boss_word_extractor(X[length//4:length//2],
+                                                y[length//4:length//2])
+            words_6 = self._boss_word_extractor(X[length//2:length*3//4],
+                                                y[length//2:length*3//4])
+            words_7 = self._boss_word_extractor(X[length*3//4:],
+                                                y[length*3//4:])
+            words = words + words_4 + words_5 + words_6 + words_7


johannfaouzi · 2023-08-31T09:24:48Z

pyts/classification/bosssp.py

+        if not isinstance(self.window_size,
+                          (int, np.integer, float, np.floating)):
+            raise TypeError("'window_size' must be an integer or a float.")
+        if isinstance(self.window_size, (int, np.integer)):
+            if self.drop_sum:
+                if not 1 <= self.window_size <= (n_timestamps - 1):
+                    raise ValueError(
+                        "If 'window_size' is an integer, it must be greater "
+                        "than or equal to 1 and lower than or equal to "
+                        "(n_timestamps - 1) if 'drop_sum=True'."
+                    )
+            else:
+                if not 1 <= self.window_size <= n_timestamps:
+                    raise ValueError(
+                        "If 'window_size' is an integer, it must be greater "
+                        "than or equal to 1 and lower than or equal to "
+                        "n_timestamps if 'drop_sum=False'."
+                    )
+            window_size = self.window_size
+        else:
+            if not 0 < self.window_size <= 1:
+                raise ValueError(
+                    "If 'window_size' is a float, it must be greater "
+                    "than 0 and lower than or equal to 1."
+                )
+            window_size = ceil(self.window_size * n_timestamps)
+
+        if not isinstance(self.window_step,
+                          (int, np.integer, float, np.floating)):
+            raise TypeError("'window_step' must be an integer or a float.")
+        if isinstance(self.window_step, (int, np.integer)):
+            if not 1 <= self.window_step <= n_timestamps:
+                raise ValueError(
+                    "If 'window_step' is an integer, it must be greater "
+                    "than or equal to 1 and lower than or equal to "
+                    "n_timestamps."
+                )
+            window_step = self.window_step
+        else:
+            if not 0 < self.window_step <= 1:
+                raise ValueError(
+                    "If 'window_step' is a float, it must be greater "
+                    "than 0 and lower than or equal to 1."
+                )
+            window_step = ceil(self.window_step * n_timestamps)


Since this method is run in the _boss_word_extractor method, if the values of window_size and window_step are floats, the proportions are computed on the split datasets, which means that the values of window_size and window_step would be different for different levels. It might not be an issue in itself, but in this case it should probably be mentioned in the documentation.

johannfaouzi · 2023-08-31T09:27:21Z

examples/classification/plot_bosssp.py

+plt.ylabel("Occurences", fontsize=14)
+plt.title("Number of occurence of each word after BOSS-SP transformation",
+          fontsize=15)
+plt.legend(loc='best')


You have no label in your plot, so there is no legend to display.

…t for the ones regarding code similar from other python modules as discussed orally

SvenBarray · 2023-09-11T22:30:37Z

Corrected the issues mentioned in the pull request feedback, except for the ones regarding code similar from other python modules as discussed orally

codecov · 2023-09-11T22:32:42Z

Codecov Report

Patch coverage is 14.87% of modified lines.

Files Changed	Coverage
pyts/classification/bosssp.py	`13.44%`
pyts/classification/__init__.py	`100.00%`

📢 Thoughts on this report? Let us know!.

fkiraly · 2024-04-12T23:21:57Z

Not a pull request meant to be accepted, but a pull request meant to share code and get feedback

@SvenBarray, would you like to add this to sktime if it is out of scope for pyts?
We follow a "collection" approach, and algorithms can be owned by individuals.

Not many changes have to be made:

using the transformation extension template
fit should be changed to _fit
tags need to be set - this is a series-to-primitives transformer, if I understand correctly
you need to add a _transform method, that should return the word features in one of the permissible data formats (e.g., data frame)

added the Boss-SP algorithm and its example. Tests are still missing.

c6d7b3c

johannfaouzi reviewed Aug 31, 2023

View reviewed changes

corrected the issues mentientoned in the pull request feedback, excep…

f94c9f4

…t for the ones regarding code similar from other python modules as discussed orally

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added the Boss-SP algorithm and its example. Tests are still missing. #156

added the Boss-SP algorithm and its example. Tests are still missing. #156

SvenBarray commented Aug 31, 2023

SvenBarray commented Aug 31, 2023

johannfaouzi Aug 31, 2023

johannfaouzi Aug 31, 2023

johannfaouzi Aug 31, 2023

johannfaouzi Aug 31, 2023

johannfaouzi Aug 31, 2023

johannfaouzi Aug 31, 2023

johannfaouzi Aug 31, 2023

johannfaouzi Aug 31, 2023

johannfaouzi Aug 31, 2023

johannfaouzi Aug 31, 2023

SvenBarray commented Sep 11, 2023

codecov bot commented Sep 11, 2023

fkiraly commented Apr 12, 2024

		level : integer (default = 3)
		Number of times the series is being divided. Maximum of 3.

added the Boss-SP algorithm and its example. Tests are still missing. #156

Are you sure you want to change the base?

added the Boss-SP algorithm and its example. Tests are still missing. #156

Conversation

SvenBarray commented Aug 31, 2023

SvenBarray commented Aug 31, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SvenBarray commented Sep 11, 2023

codecov bot commented Sep 11, 2023

Codecov Report

fkiraly commented Apr 12, 2024