Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Implement Merit Score Function channel selection algorithm #1481

Open
TonyBagnall opened this issue Apr 27, 2024 · 0 comments
Open

[ENH] Implement Merit Score Function channel selection algorithm #1481

TonyBagnall opened this issue Apr 27, 2024 · 0 comments
Labels
channel selection enhancement New feature, improvement request or other non-bug code enhancement transformations Transformations package

Comments

@TonyBagnall
Copy link
Contributor

TonyBagnall commented Apr 27, 2024

Describe the feature or idea you want to propose

Merit score function algorithm described in "A Feature Selection Method for Multi-dimension Time-Series Data"

https://link.springer.com/chapter/10.1007/978-3-030-65742-0_15

A method based around one nearest neighbour classification with dynamic time warping (1-NN DTW) is described in \cite{kathirgamanathan20mtsc}. A merit score function (MSTS) is used to assess the quality of a subset of dimensions. The DTW distance function between cases and dimensions is precalculated. A prediction for each dimension pair is found through a three fold cross validation of 1-NN DTW. Similarity between each dimension is estimated using the adjusted mutual information (AMI) between the predictions of dimensions (dimension-to-dimension) and for the predictions of each dimension and the class (dimension-to-class). The MSTS for any subset of dimensions is a function of the average of the dimension-to-dimension and dimension-to-class AMI. A subset of features is chosen either through enumerating MSTS for all $2^d$ feature combinations, or using a wrapper on the top 5% of subsets.
The algorithm first calculate the dimension-to-class (DC) correlation for each dimension which is the accuracy of the predictions $\hat{y}$ on train data by cross validation with 3 folds. Second, the dimension-to-dimension (DD) is calculated by the adjusted mutual information (AMI) between the predictions of each pair of dimensions. Finally, for each possible subset, the merit score function is calculated as follows:

$MS(subset) = \frac{k \overline{DC}}{\sqrt{k+k(k-1)\overline{DD}}}$

Where $\overline{DC}$ is the average of dimension-to-class of each dimension in the subset and $\overline{DD}$ is the average of dimension-to-dimension of each pair of dimensions in the subset.
The evaluation of all dimension combinations makes MSTS infeasible for very high dimensional problems. MSTS has recently been applied to sensor data, and used in conjunction with ROCKET "Feature Subset Selection for Detecting Fatigue in Runners using Time Series Sensor Data",
https://dl.acm.org/doi/10.1007/978-3-031-09037-0_44

Describe your proposed solution

Implement as a BaseCollectionTransformer in the channel_selection package

@TonyBagnall TonyBagnall added enhancement New feature, improvement request or other non-bug code enhancement transformations Transformations package channel selection labels Apr 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
channel selection enhancement New feature, improvement request or other non-bug code enhancement transformations Transformations package
Projects
None yet
Development

No branches or pull requests

1 participant