Allow NaN or None values to be passed in, and silently ignored #90

ClimbsRocks · 2016-12-02T03:11:17Z

In my DataFrames, I oftentimes find it very reasonable to have quite a few NaN or None values.

In order to run tsfresh, I just set all those values to 0, which is... not ideal. Even imputing missing values would not work particularly well for several of my use cases.

Yet, it seems (from a super naive outsider's perspective), like this is filtering that tsfresh could do relatively easily itself.

When it grabs each time_series, it can simply remove or categorically ignore NaN/None, and compute features on the values that do exist. This makes my life easier when, say, one customer signs up a month after another customer, and thus has missing values for that month.

Again, super naive outsider perspective here, I know this might be impossible. But if it is possible, I'd love to add in that bit of filtering!

MaxBenChrist · 2016-12-02T14:41:36Z

You can solve this by calling df.dropna(axis=1, inplace=True) on the data frame that contains the time series in tsfresh format

Afterwards, you can just pass df to tsfresh

MaxBenChrist · 2016-12-02T14:45:28Z

This was a design decision. tsfresh will not tinker with the input time series data by for example imputing values or dropping NAs.

Reason behind this: In data science projects, NAs should be handled with special care, often they contain a lot of information. We don't want our packages to silently remove those informations by dropping it. This is way we are not imputing the input data.

tompollard · 2019-09-10T18:19:17Z

This was a design decision. tsfresh will not tinker with the input time series data by for example imputing values or dropping NAs.

In many cases doesn't it make sense to treat nans as nans, rather than requiring imputation? For example, features like mean, max and min are more informative if the nans are simply ignored when computing the value.

yitao-yu · 2023-10-06T07:52:54Z

Actually, there is a workaround by customizing features.

There is, however, a function I discovered that is not consistent with the design principle introduced in this post when doing so. tsfresh.feature_extraction.feature_calculators.skewness utilizes pandas.Series.skew with a parameter skipna default to True.

This is confirmed with dummy data:

tsfresh.feature_extraction.feature_calculators.skewness(np.array([1,1,2,3,np.nan]))
--> 0.8545630383279712

A fix would be simply set the param to False.

@set_property("fctype", "simple")
@set_property("input", "pd.Series")
def skewness(x):
    """
    Returns the sample skewness of x (calculated with the adjusted Fisher-Pearson standardized
    moment coefficient G1).

    :param x: the time series to calculate the feature of
    :type x: numpy.ndarray
    :return: the value of this feature
    :return type: float
    """
    if not isinstance(x, pd.Series):
        x = pd.Series(x)
    return pd.Series.skew(x, skipna = False)

MaxBenChrist closed this as completed Dec 2, 2016

MaxBenChrist reopened this Dec 2, 2016

MaxBenChrist closed this as completed Dec 10, 2016

yitao-yu mentioned this issue Oct 6, 2023

Update tsfresh.feature_extraction.feature_calculators.skewness to make it consistent with the design principle of not ignoring nan #1051

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow NaN or None values to be passed in, and silently ignored #90

Allow NaN or None values to be passed in, and silently ignored #90

ClimbsRocks commented Dec 2, 2016

MaxBenChrist commented Dec 2, 2016 •

edited

MaxBenChrist commented Dec 2, 2016

tompollard commented Sep 10, 2019

yitao-yu commented Oct 6, 2023 •

edited

Allow NaN or None values to be passed in, and silently ignored #90

Allow NaN or None values to be passed in, and silently ignored #90

Comments

ClimbsRocks commented Dec 2, 2016

MaxBenChrist commented Dec 2, 2016 • edited

MaxBenChrist commented Dec 2, 2016

tompollard commented Sep 10, 2019

yitao-yu commented Oct 6, 2023 • edited

MaxBenChrist commented Dec 2, 2016 •

edited

yitao-yu commented Oct 6, 2023 •

edited