Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose Markov Transition Matrix from pyts.image.MarkovTransitionField. #120

Open
y-he2 opened this issue Nov 10, 2021 · 4 comments
Open

Comments

@y-he2
Copy link

y-he2 commented Nov 10, 2021

The Markov Transition Matrix contained in the MTF transformer could be useful in many cases.
It shouldnt be too much work to expose it (as well as the quantile boundaries) by storing it in the transformer once its fitted. This also means that the computation of the Markov Transition Matrix should be done in the fit pass instead of each time the transform is called. i.e. move:
in pyts/pyts/image/mtf.py

    def transform(self, X):
...
        X = check_array(X)
        n_samples, n_timestamps = X.shape
        image_size = self._check_params(n_timestamps)

        discretizer = KBinsDiscretizer(n_bins=self.n_bins,
                                       strategy=self.strategy)
        X_binned = discretizer.fit_transform(X)

        X_mtm = _markov_transition_matrix(X_binned, n_samples,
                                          n_timestamps, self.n_bins)
        sum_mtm = X_mtm.sum(axis=2)
        np.place(sum_mtm, sum_mtm == 0, 1)
        X_mtm /= sum_mtm[:, :, None]
...

into

    def fit(self, X=None, y=None):
        """Pass.
        Parameters
        ----------
        X
            Ignored
        y
            Ignored
        Returns
        -------
        self : object
        """
        return self

Is it possible to implement this?

@johannfaouzi
Copy link
Owner

If there would be a single Markov transition matrix for the whole training set of time series, then it would be computed in the fit method and would be exposed as an attribute.

However, a Markov transition matrix is computed for every time series in the training set, because the bin edges and the transition probabilities are independently computed using the values of each time series. This is what is proposed in the paper introducing this method and thus what is implemented in this package. Did you think that a single Markov transition matrix was computed for the whole training set?

It's not the first time that this issue / question is reported / suggested. Maybe I should highlight this point in the documentation and possibly enable the other behavior (computing a single Markov transition matrix for the whole training set).

@y-he2
Copy link
Author

y-he2 commented Nov 23, 2021

Hi, sry for a very late reply.

I understand ur logic that if the matrices are computed once for any input time series, then theres no need to "fit" them as a pre-step.
Still since u inherited the fit, transform structure, kinda makes me urge to make some use of the fit function. Maybe lets some for some specific (prob weird) reason that i want to fit the the transformer once, to a fixed count of time series, store and reuse the underlaying matrices, to transform an other list (same count) of time series. Would this make sense to be implemented instead?
Ofc this would mean that once fit, the transformer's shape would be fixed.

If all of these doesn't make sense, would it still be possible to expose all of the underlaying GMF/MTF matrices?

@johannfaouzi
Copy link
Owner

The fit method is actually needless when you are just using a single estimator, but it's actually needed if you want to use tools from scikit-learn for cross-validation (e.g., sklearn.model_selection.GridSearchCV) or pipelines (e.g., sklearn.pipeline.Pipeline).

Since a lot of transformers in pyts apply the transformation on each time series (i.e., row) independently instead of feature (i.e., column) independently, the fit transform does nothing because doing fit followed by transform would be less efficient than doing just transform (because some piece of code would be run twice to obtain the same results).

Moreover, I think that limiting the transform method to be applied on the same input data than the fit method would be confusing for a lot of people because that's not what is done in standard machine learning: you fit an estimator on the training data and you transform not only the training data but also the validation / test data (basically any sample).

Still since u inherited the fit, transform structure, kinda makes me urge to make some use of the fit function. Maybe lets some for some specific (prob weird) reason that i want to fit the the transformer once, to a fixed count of time series, store and reuse the underlaying matrices, to transform an other list (same count) of time series. Would this make sense to be implemented instead?
Ofc this would mean that once fit, the transformer's shape would be fixed.

For me, it would only make sense if the estimator would learn a single Markov Transition Matrix for all the time series when calling fit. Then, any time series could be transformed into a Markov Transition Field using this single Markov Transition Matrix.

If all of these doesn't make sense, would it still be possible to expose all of the underlaying GMF/MTF matrices?

I'm not sure to understand what you mean. The transform method returns the Gramian Angular Field (GAF) / Markov Transition Field (MTF) for each time series. For GAF, there are no underlying matrices. For MTF, there are the underlying Markov Transition Matrices (MTM) (one for each time series) that are not exposed indeed. Do you want to get the Markov Transition Matrices?

@y-he2
Copy link
Author

y-he2 commented Jan 11, 2022

Sorry again for a very late reply.

Its indeed a bit tricky with the whole "fit, transform" scenario. My OP was not coming from the ML fields actually, more like probability analysis.
So, I guess now I see it more clearly that if I would have requested like below in the beginning it had probably made more sense:

  • Yes, what I wanted was the Markov Transition Matrix, so maybe a feature to compute just that, for a single one or set of (produce one for whole set of same scale) time series, as a separate function.
  • As for the GAFs I guess i meant a function to turn them series into the polar coords, although I prob shouldnt have mentioned GAF as the OP was titled MTF/MTMs.

So yeah, what I really wanted was more like functions to compute the intermediate data objects rather than to expose them during the transformation. Although this topic could also belong to a separate feature request instead, or what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants