Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rolling window with step size #15354

Closed
alexlouden opened this issue Feb 9, 2017 · 41 comments · Fixed by #45765
Closed

Rolling window with step size #15354

alexlouden opened this issue Feb 9, 2017 · 41 comments · Fixed by #45765
Labels
Enhancement Needs Discussion Requires discussion from core team before further action Window rolling, ewma, expanding
Milestone

Comments

@alexlouden
Copy link

Just a suggestion - extend rolling to support a rolling window with a step size, such as R's rollapply(by=X).

Code Sample

Pandas - inefficient solution (apply function to every window, then slice to get every second result)

import pandas
ts = pandas.Series(range(0, 40, 2))
ts.rolling(5).apply(max).dropna()[::2]

Suggestion:

ts = pandas.Series(range(0, 40, 2))
ts.rolling(window=5, step=2).apply(max).dropna()

Inspired by R (see rollapply docs):

require(zoo)
TS <- zoo(seq(0, 40, 2))
rollapply(TS, 5, FUN=max, by=2)

8 12 16 20 24 28 32 36 40

@max-sixty
Copy link
Contributor

If you're using 'standard' functions, these are vectorized, and so v fast (ts.rolling(5).max().dropna()[::2]).

IIUC the saving here would come from only applying the function a fraction of the time (e.g. every nth value). But is there a case where that makes a practical difference?

@jreback
Copy link
Contributor

jreback commented Feb 9, 2017

this could be done, but i would like to see a usecase where this matters. This would break the 'return same size as input' API as well. Though I don't think this is actually hard to implement (though would involve a number of changes in the implementation). We use marginal windows (IOW, compute the window and as you advance drop off the points that are leaving and add points that you are gaining). So still would have to compute everthing, but you just wouldn't output it.

@jreback jreback added Needs Discussion Requires discussion from core team before further action Numeric Operations Arithmetic, Comparison, and Logical operations labels Feb 9, 2017
@alexlouden
Copy link
Author

Thanks for your replies!

IIUC the saving here would come from only applying the function a fraction of the time (e.g. every nth value). But is there a case where that makes a practical difference?

My use case is running aggregation functions (not just max) over some large timeseries dataframes - 400 columns, hours of data at 5-25Hz. I've also done a similar thing (feature engineering on sensor data) in the past with data up to 20kHz. Running 30 second windows with a 5 second step saves a big chunk of processing - e.g. at 25Hz with a 5s step it's 1/125th of the work, which makes the difference between it running in 1 minute or 2 hours.

I can obviously fall back to numpy, but it'd be nice if there was a higher level API for doing this. I just thought it was worth the suggestion in case others would find it useful too - I don't expect you to build a feature just for me!

@jreback
Copy link
Contributor

jreback commented Feb 10, 2017

you can try resamplimg to a higher frequency interval first then rolling

something like

df = df.resample('30s')
df.rolling(..).max() (or whatever function)

@alexlouden
Copy link
Author

Hey @jreback, thanks for the suggestion.

This would work if I was just running max on my data (resample needs a reduction function, otherwise it defaults to mean, right?):

df.resample('1s').max().rolling(30).max()

However I'd like to run my reduction function on 30 seconds of data, then move forward 1 second, and run it on the next 30 seconds of data, etc. The method above applies a function on 1 second of data, and then another function on 30 results of the first function.

Here's a quick example - running a peak to peak calculation doesn't work running twice (obviously):

# 10 minutes of data at 5Hz
n = 5 * 60 * 10
rng = pandas.date_range('1/1/2017', periods=n, freq='200ms')
np.random.seed(0)
d = np.cumsum(np.random.randn(n), axis=0)
s = pandas.Series(d, index=rng)

# Peak to peak
def p2p(d):
    return d.max() - d.min()

def p2p_arr(d):
    return d.max(axis=1) - d.min(axis=1)

def rolling_with_step(s, window, step, func):
    # See https://ga7g08.github.io/2015/01/30/Applying-python-functions-in-moving-windows/
    vert_idx_list = np.arange(0, s.size - window, step)
    hori_idx_list = np.arange(window)
    A, B = np.meshgrid(hori_idx_list, vert_idx_list)
    idx_array = A + B
    x_array = s.values[idx_array]
    idx = s.index[vert_idx_list + int(window/2.)]
    d = func(x_array)
    return pandas.Series(d, index=idx)

# Plot data
ax = s.plot(figsize=(12, 8), legend=True, label='Data')

# Plot resample then rolling (obviously does not work)
s.resample('1s').apply(p2p).rolling(window=30, center=True).apply(p2p).plot(ax=ax, label='1s p2p, roll 30 p2p', legend=True)

# Plot rolling window with step
rolling_with_step(s, window=30 * 5, step=5, func=p2p_arr).plot(ax=ax, label='Roll 30, step 1s', legend=True)

rolling window

@jreback
Copy link
Contributor

jreback commented Feb 10, 2017

@alexlouden from your original description I think something like

df.resample('5s').max().rolling('30s').mean() (or whatever reductions) is more in-line with what you want

IOW, take whatever is in a 5s bin, then reduce it to a single point, then roll over those bins. This general idea is that you have lots of data that can be summarized at a short timescale, but you actually want the rolling of this at a higher level.

@alexlouden
Copy link
Author

Hey @jreback, I actually want to run a function over 30 seconds of data, every 5 seconds. See the rolling_with_step function in my previous example. The additional step of max/mean doesn't work for my use case.

@Murmuria
Copy link

Murmuria commented Mar 4, 2017

@jreback, there is a real need for the step function that hasn't been brought out in this discussion yet. I second everything that @alexlouden has described, but I would like to add more use cases.

Suppose that we are doing time-series analysis with input data sampled approximately 3 to 10 milliseconds. We are interested in frequency domain features. The first step in constructing them would be to find out the Nyquist frequency. Suppose by domain knowledge we know that is 10 Hz (once every 100 ms). That means, we need the data to have a frequency of at least 20 Hz (once every 50 ms), if the features should capture the input signal well. We cannot resample to a lower frequency than that. Ultimately here are the computations we do:

df.resample('50ms').mean().rolling(window=32).aggregate(power_spectrum_coeff)

Here we chose a window size in multiples of 8, and choosing 32 makes the window size to be 1.6 seconds. The aggregate function returns the single-sided frequency domain coefficients and without the first mean component (the fft function is symmetric and with mean value at 0th element). Following is the sample aggregate function:

def power_spectrum_coeff():
    def power_spectrum_coeff_(x):
        return np.fft.fft(x)[1 : int(len(x) / 2 + 1)]

    power_spectrum_coeff_.__name__ = 'power_spectrum_coeff'
    return power_spectrum_coeff_

Now, we would like to repeat this in a sliding window of, say, every 0.4 seconds or every 0.8 seconds. There is no point in wasting computations and calculating FFT every 50 ms instead and then slicing later. Further, resampling down to 400 ms is not an option, because 400 ms is just 2.5 Hz, which is much lower than Nyquist frequency and doing so will result in all information being lost from the features.

This was frequency domain features, which has applications in many time-series related scientific experiments. However, even simpler time-domain aggregate functions such as standard deviation cannot be supported effectively by resampling.

Though I don't think this is actually hard to implement (though would involve a number of changes in the implementation). We use marginal windows (IOW, compute the window and as you advance, drop off the points that are leaving and add points that you are gaining). So still would have to compute everything, but you just wouldn't output it.

Having the 'step' parameter and being able to reduce actual computations by using it has to be the future goal of Pandas. If the step parameter only returns fewer points, then it's not worth doing, because we can slice the output anyhow. Perhaps given the work involved in doing this, we might just recommend all projects with these needs to use Numpy.

@jreback
Copy link
Contributor

jreback commented Mar 4, 2017

@Murmuria you are welcome to submit a pull-request to do this. Its actually not that difficult.

@mrullmi
Copy link

mrullmi commented Aug 30, 2017

While I second the request for a step parameter in rolling(), I'd like to point out that it is possible to get the desired result with the base parameter in resample(), if the step size is an integer fraction of the window size. Using @alexlouden 's example:

pandas.concat([
    s.resample('30s', label='left', loffset=pandas.Timedelta(15, unit='s'), base=i).agg(p2p) 
    for i in range(30)
]).sort_index().plot(ax=ax, label='Solution with resample()', legend=True, style='k:')

We get the same result (note that the line extends by 30 sec. on both sides):
rolling_with_step_using_resample

This is still somewhat wasteful, depending on the type of aggregation. For the particular case of peak-to-peak calculation as in @alexlouden 's example, p2p_arr() is almost 200x faster because it rearranges the series to a 2-D matrix and then uses a single call to max() and min().

@AlexS12
Copy link

AlexS12 commented Oct 30, 2017

The step parameter in rolling would also allow using this feature without a datetime index. Is there anyone already working on it?

@tsando
Copy link

tsando commented Mar 20, 2018

@alexlouden above said this:

I can obviously fall back to numpy, but it'd be nice if there was a higher level API for doing this.

Can @alexlouden or anyone else who knows please share some insight as to how to do this with numpy? From my research so far, it seems it is not trivial to do this either in numpy. In fact, there's an open issue about it here numpy/numpy#7753

Thanks

@alexlouden
Copy link
Author

Hi @tsando - did the function rolling_with_step I used above not work for you?

@tsando
Copy link

tsando commented Mar 21, 2018

@alexlouden thanks, just checked that function and it seems to still depend on pandas (takes a series as an input and also uses the series index). I was wondering if there's a purely numpy approach on this. In the thread i mentioned numpy/numpy#7753 they propose a function which uses numpy strides, but they are hard to understand and translate to window and step inputs.

@alexlouden
Copy link
Author

@tsando Here's a PDF of the blog post I linked to above - looks like the author has changed his Github username and hasn't put his site up again. (I just ran it locally to convert it to PDF).

My function above was me just converting his last example to work with Pandas - if you wanted to use numpy directly you could do something like this: https://gist.github.com/alexlouden/e42f1d96982f7f005e62ebb737dcd987

Hope this helps!

@tsando
Copy link

tsando commented Mar 21, 2018

@alexlouden thanks! I just tried it on an array of shape (13, 1313) but it gave me this error:

image

@Pierre-Bartet
Copy link

"this could be done, but i would like to see a usecase where this matters."

Whatever the project I worked on using pandas, I almost always missed this feature, it is usefull everytime you need to compute the apply only once in a while but still need good resolution inside each window.

@tsando
Copy link

tsando commented Dec 12, 2018

I agree and support this feature too

@ellsaking
Copy link

Need it almost every time when dealing with time series, the feature could give much better control for generating time series features for both visualization and analysis. Strongly support this idea!

@wangweichao0403
Copy link

agree and support this feature too

@adrienrenaud
Copy link

This would be very helpful to reduce computing time still keeping a good window resolution.

@BruceBinBoxing
Copy link

I provide a solution codes, which could be further adjusted accordign to your particular target.

def average_smoothing(signal, kernel_size, stride):
    sample = []
    start = 0
    end = kernel_size
    while end <= len(signal):
        start = start + stride
        end = end + stride
        sample.append(np.mean(signal[start:end]))
    return np.array(sample)

@masip85
Copy link

masip85 commented Sep 19, 2019

I agree and support this feature. I see is in stop motion right now.

@franchesoni
Copy link

Calculating and then downsampling is not an option when you have TBs of data.

@pmdaly
Copy link

pmdaly commented Nov 15, 2019

It would be very helpful in what I do as well. I have TBs of data where I need various statistics of non-overlapping windows to understand local conditions. My current "fix" is to just create a generator that slices the data frames and yield's statistics. Would be very helpful to have this feature.

@magratheaner
Copy link

magratheaner commented Apr 18, 2020

To contribute to 'further discussion':
My use case is to compute one min/max/median value per hour for a month of data with a resolution of 1 second. It's energy usage data and there are peaks for 1-2 seconds that I would lose with resampling. Other than that, resampling to e.g. 5 seconds/1 minute wouldn't change the fact that I still have to compute 4k/1k windows per day that need to be thrown away, rather than just being able to compute the needed 24 windows per day.

It would be possible to work around this by using groupby a.s.o. but that seems to be neither intuitive nor as fast as the rolling implementation (2 seconds for 2.5mil hour-long windows with sorting). It's impressively fast and useful, but we really need a stride argument to fully utilize its power.

anthonytw added a commit to anthonytw/pandas that referenced this issue May 13, 2020
@anthonytw
Copy link

I took a look at the problem. This is relatively trivial, however the way the code is implemented, from a cursory look I think it'll require someone to slog through manually editing all the rolling routines. None of them respect the window boundaries given by the indexer classes. If they did, both this request as well as #11704 would be very easily solvable. In any case, I think it is manageable for anyone who wants to spend some time sprucing things up. I initiated a half-baked PR (expected to be rejected, just for an MVP) to demonstrate how I would tackle the problem.

Running:

import numpy as np
import pandas as pd

data = pd.Series(
    np.arange(100),
    index=pd.date_range('2020/05/12 12:00:00', '2020/05/12 12:00:10', periods=100))

print('1s rolling window every 2s')
print(data.rolling('1s', step='2s').apply(np.mean))

data.sort_index(ascending=False, inplace=True)

print('1s rolling window every 500ms (and reversed)')
print(data.rolling('1s', step='500ms').apply(np.mean))

yields

1s rolling window every 2s
2020-05-12 12:00:00.000000000     4.5
2020-05-12 12:00:02.020202020    24.5
2020-05-12 12:00:04.040404040    44.5
2020-05-12 12:00:06.060606060    64.5
2020-05-12 12:00:08.080808080    84.5
dtype: float64
1s rolling window every 500ms (and reversed)
2020-05-12 12:00:10.000000000    94.5
2020-05-12 12:00:09.494949494    89.5
2020-05-12 12:00:08.989898989    84.5
2020-05-12 12:00:08.484848484    79.5
2020-05-12 12:00:07.979797979    74.5
2020-05-12 12:00:07.474747474    69.5
2020-05-12 12:00:06.969696969    64.5
2020-05-12 12:00:06.464646464    59.5
2020-05-12 12:00:05.959595959    54.5
2020-05-12 12:00:05.454545454    49.5
2020-05-12 12:00:04.949494949    44.5
2020-05-12 12:00:04.444444444    39.5
2020-05-12 12:00:03.939393939    34.5
2020-05-12 12:00:03.434343434    29.5
2020-05-12 12:00:02.929292929    24.5
2020-05-12 12:00:02.424242424    19.5
2020-05-12 12:00:01.919191919    14.5
2020-05-12 12:00:01.414141414     9.5
2020-05-12 12:00:00.909090909     4.5
dtype: float64

For implementation details take a look at the PR (or here: https://github.com/anthonytw/pandas/tree/rolling-window-step)

While I would have liked to spend more time to finish it up I unfortunately have none left to tackle the grunt work of reworking all the rolling functions. My recommendation for anyone who wants to tackle this would be to enforce the window boundaries generated by the indexer classes and unify the rolling_*_fixed/variable functions. With start and end boundaries I don't see any reason they should be different, unless you have a function which does something special with non-uniformly sampled data (in which case that specific function would be better able to handle the nuance, so maybe set a flag or something).

@Lama09
Copy link

Lama09 commented Aug 25, 2020

Will this also work for a custom window using the get_window_bounds() approach?

@juan-carlos-calvo
Copy link

Hi there, I second also the suggestion please. This would be a really useful feature.

@LosaMatova
Copy link

LosaMatova commented Sep 3, 2020

If you're using 'standard' functions, these are vectorized, and so v fast (ts.rolling(5).max().dropna()[::2]).

IIUC the saving here would come from only applying the function a fraction of the time (e.g. every nth value). But is there a case where that makes a practical difference?

I have just such an example here: https://stackoverflow.com/questions/63729190/pandas-resample-daily-data-to-annual-data-with-overlap-and-offset

Every Nth would be every 365th. The window size is variable over the lifetime of the program and the step is not guaranteed to be an integer fraction of the window size.

I basically need a set window size that steps by "# of days in the year it's looking at" which is impossible with every solution I've found for this issue so far.

@lucsorel
Copy link

lucsorel commented Oct 1, 2020

I also have a similar need with the following context (adapted from a real and professional need):

  • I have a chronological dataframe with a timestamp column and a value column, which represents irregular events. Like the timestamp of when a dog passed below my window and how many seconds it took her to pass along. I can have 6 events for a given day and then no event at all for the next 2 days
  • I would like compute a metric (let's say the mean time spent by dogs in front of my window) with a rolling window of 365 days, which would roll every 30 days

As far as I understand, the dataframe.rolling() API allows me to specify the 365 days duration, but not the need to skip 30 days of values (which is a non-constant number of rows) to compute the next mean over another selection of 365 days of values.

Obviously, the resulting dataframe I expect will have a (much) smaller number of rows than the initial 'dog events' dataframe.

@mroeschke
Copy link
Member

Just to gain more clarity about this request with a simple example.

If we have this Series:

In [1]: s = pd.Series(range(5))

In [2]: s
Out[2]:
0    0
1    1
2    2
3    3
4    4
dtype: int64

and we have a window size of 2 and step size of 1. This first window at index 0 would be evaluated, step over the window at index 1, evaluate the window at index 2, etc?

In [3]: s.rolling(2, step=1, min_periods=0).max()

Out[3]:
0    0.0
1    NaN # step over this observation
2    2.0
3    NaN # step over this observation
4    4.0
dtype: float64

Likewise if we have this time based Series

In [1]: s = pd.Series(range(5), index=pd.DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-06', '2020-01-09']))

In [2]: s
Out[2]:
2020-01-01    0
2020-01-02    1
2020-01-03    2
2020-01-06    3
2020-01-09    4
dtype: int64

and we have a window size of '3D' and step size of '3D'. Would this be the correct result?

In [3]: s.rolling('3D', step='3D', min_periods=0).max()

Out[3]:
2020-01-01    0.0       # evaluate this window
2020-01-02    NaN    # step over this observation (2020-01-01 + 3 days > 2020-01-02)
2020-01-03    NaN    # step over this observation (2020-01-01 + 3 days > 2020-01-03)
2020-01-06    3.0      # evaluate this window ("snap back" to this observation)
2020-01-09    4.0      # evaluate this window (2020-01-06 + 3 days = 2020-01-09)
dtype: float64

@anthonytw
Copy link

anthonytw commented Nov 17, 2020

@mroeschke wrt to the first example ([3]), the results are not what I would expect. I assume this is a trailing window (e.g., at index=0 it would be the max of elements at -1 and 0, so just max([0]), then it should step forward "1" index, to index=0+step=1, and the next computation would be max([0,1]), then max([1,2]), etc. What it looks like you meant to have was a step size of two, so you would move from index=0 to index=0+2=2 (skipping index 1), and continuing like that. In this case it's almost correct, but there should be no NaNs. While it may be "only" double the size in this case, in other cases it is substantial. For example, I have about an hour's worth of 500Hz ECG data for a patient, that's 1.8 million samples. If I wanted a 5-minute moving average every two minutes, that would be an array of 1.8 million elements with 30 valid computations and slightly less than 1.8 million NaNs. :-)

For indexing, step size = 1 is the current behavior, i.e., compute the feature of interest using data in the window, shift the window by one, then repeat. In this example, I want to compute the feature of interest using the data in the window, then shift by 60,000 indices, then repeat.

Similar remarks for the time. In this case, there might be some disagreement as to the correct way to implement this type of window, but in my opinion the "best"(TM) way is to start from time t0, find all elements in the range (t0-window, t0], compute the feature, then move by the step size. Throw away any windows that have fewer than the minimum number of elements (can be configurable, default to 1). That example is for a trailing window, but you can modify to fit any window configuration. This has the disadvantage of wasting time in large gaps, but gaps can be handled intelligently and even if you compute the naive way (because you're lazy like me) I've yet to see this matter in practice, since the gaps are usually not large enough to matter in real data. YMMV.

Maybe that's clearer? Take a look at my example + code above, that might explain it better.

@mroeschke
Copy link
Member

Thanks for the clarification @anthonytw. Indeed, looks like I needed to interpret step as "step to point".

As for the NaNs, I understand the sentiments to drop the NaNs in the output result automatically, but as mentioned in #15354 (comment) by @jreback, there is an API consistency consideration to have the output have the same length as the input. There may be user that would like to keep the NaNs as well (maybe?), and dropna would still be available after the rolling(..., step=...).func() operation.

@anthonytw
Copy link

@mroeschke I think exceptions should be made. So long as the you put an explicit note in the documentation, and the behavior is not default, no one will be adversely affected by not returning a vector full of junk. Keeping NaNs defeats half the purpose. One objective is to limit the number of times we perform an expensive computation. The other objective is to minimize the feature set to something manageable. That example I gave you is real one, and not nearly as much data as one really has to process in a patient monitoring application. Is it really necessary to allocate 60000x the necessary space, then search through the array to delete NaNs? For each feature we want to compute?

Note that one computation might produce an array of values. What do I want to do with an ECG waveform? Well, compute the power spectrum, of course! So I need to allocate then enough space for 1 full PSD vector (150,000 elements) 1.8 million times (2TB of data) then filter through to get the pieces I care about (34MB). For all the series. For all the patients. I guess I need to buy more RAM!

It's also worth mentioning that NaN, for some features, might be a meaningful output. In which case, I no longer can tell the difference between a meaningful NaN and the junk NaNs padding the data.

While I understand the desire to maintain the API, this is not a feature that will break any existing code (because it's a new feature that didn't exist before), and given the functionality there is no reason anyone would expect it to yield an output of the same size. And even if they did, a note in the documentation for the step size would be sufficient. The disadvantages far outweigh any benefit of having a "consistent" API (for a feature that didn't previously exist, mind you). Not proceeding this way will cripple the feature, it's almost not even worth implementing in that case (in my experience the space cost is almost always the bigger factor).

@minesh1291
Copy link

minesh1291 commented Dec 31, 2020

Here is how I tried:
First I grouped dataframe by window size then use offset to roll those groups and calculate aggregation.
It works well for window_size/stride is int. Not optimized but I hope a good start.
welcome to tryout in notebook.

# for dataframe with datetime index

def rolling_apply_with_strides(
    df, window_size=15, strides=5, unit="s", functions=[np.mean]
):
    def resample_apply(i):
        resampled = df.resample(
                f"{window_size}{unit}",
                label="left",
                offset=f"{i}{unit}",
            ).agg(functions)
        resampled.index = resampled.index + to_offset(f"{window_size-1}{unit}")
        return resampled
    res = pd.concat(
        [
            resample_apply(i)
            for i in range(0, window_size, strides)
        ]
    ).sort_index()
    return res

Example:

res = rolling_apply_with_strides(df, window_size=4, strides=2, unit="s", functions=[np.std, np. mean]) 

image

# for dataframe without datetime index
def rolling_apply_with_strides(
    df, window_size=15, strides=5, functions=[np.mean]
):
    def group_apply(i):
        tmp_df = df.groupby(tmp_index.shift(i)).agg(functions)
        new_index = df.index[window_size+i-1::window_size]
        tmp_df = tmp_df.iloc[:new_index.shape[0],:]
        tmp_df.index = new_index
        return tmp_df
    
    tmp_index = pd.Series(np.arange(df.shape[0]))
    tmp_index = tmp_index//window_size
    res = pd.concat(
        [
            group_apply(i) for i in range(0, window_size, strides)
        ]
    ).sort_index()
    return res

Example:

res = rolling_apply_with_strides(df, window_size=4, strides=2, functions=[np.std, np. mean])

image

Hi, @AlexS12 here is code for non datetime indexed dataframe.

@mroeschke mroeschke removed the Numeric Operations Arithmetic, Comparison, and Logical operations label May 8, 2021
@rosagold rosagold mentioned this issue Oct 11, 2021
7 tasks
@rosagold
Copy link
Contributor

rosagold commented Oct 14, 2021

i working on a PR for this, but i have some questions about the further steps :D (hehe, rolling on floor with steps)..

  1. i'm not quite sure if every subcalss of BaseWindow [1] needs this feature. IMO its not necessary/needed, but may the public api demands it ? Especially Window and Rolling share the same documentation.. Currently i implemented this for Rolling, therefore also implicitly for RollingGroupby. is this sufficient ?
    (edited: Window is supported now - no docu conflict anymore)

  2. can anyone give me some hints what tests are need or neat? i thought of

    • constructor
    • invalid constructor
    • basic functionality

[1]
image

@bjfar
Copy link

bjfar commented Mar 19, 2024

Why is this issue closed? Yes, a step parameter was added, but it doesn't seem to work in the use cases that were used here to argue for the feature? E.g. if I do

times = pd.date_range(start=pd.Timestamp.now(), end=pd.Timestamp.now() + pd.Timedelta(minutes=1),
                      periods=61)
data = np.arange(61)
df = pd.DataFrame({'times': times, 'data': data})
df_windows = df.rolling(on='times', window=pd.Timedelta(seconds=2), step=2)

Then pandas 2.2.1 throws NotImplementedError: step is not supported with frequency windows

But wasn't the whole point to use this with frequency windows? I also note that step has to be an integer, which does not generally make sense for timeseries data. A pd.Timedelta step should be allowed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Discussion Requires discussion from core team before further action Window rolling, ewma, expanding
Projects
None yet