Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Incorrect behavior of window aggregation functions on disjoint windows skipping overflowing elements #45647

Closed
3 tasks done
rtpsw opened this issue Jan 26, 2022 · 1 comment · Fixed by #45655
Closed
3 tasks done
Assignees
Labels
Bug Window rolling, ewma, expanding
Milestone

Comments

@rtpsw
Copy link
Contributor

rtpsw commented Jan 26, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import numpy as np
import pandas as pd
from pandas.core.indexers.objects import BaseIndexer
f64max = 1.7976931348623158E+308
values = np.array([1, f64max, f64max, 3], dtype=np.float64)
start = np.array([0, 0, 3, 3], dtype=np.int64)
end = start + 1
class WinInd(BaseIndexer):
    def get_window_bounds(self, num_values: 'int' = 0, min_periods: 'int | None' = None, center: 'bool | None' = None, closed: 'str | None' = None) -> 'tuple[np.ndarray, np.ndarray, np.ndarray]':
        return start, end

pd.Series(values).rolling(WinInd()).sum()
pd.Series(values).rolling(WinInd()).mean()

Issue Description

The example outputs

0    1.0
1    1.0
2    NaN
3    NaN
dtype: float64

for both sum and mean. The correct behavior would output 4.0 instead of NaN in the last two rows. This shows that at least these rolling window aggregations produce incorrect output on disjoint windows that skip elements whose sum overflows. This incorrect behavior originates in the window aggregation functions in pandas._libs.window.aggregations that process, rather than skip over, elements outside the disjoint windows; many of these functions have this problem. Here is code that shows this for sum and mean:

>>> import numpy as np
>>> import pandas as pd
>>> import pandas._libs.window.aggregations as wa
>>> values = np.array([1, np.inf, 3], dtype=np.float64)
>>> start = np.array([0, 2], dtype=np.int64)
>>> end = start + 1
>>> wa.roll_sum(values, start, end, 0)
array([ 1., nan, nan])
>>> wa.roll_mean(values, start, end, 0)
array([ 1., nan, nan])

Because these window aggregation functions are not exposed to the user and are wrapped by defensive code within Series.rolling that handles np.inf values, the above example is more involved and induces an overflow to expose the behavior.

One issue where the need for handling disjoint windows occurs is in GH-15354 when the step size is larger than the window size, which is the use case described there. The current issue is a precursor for handling GH-15354.

Expected Behavior

The expected output is for both sum and mean is:

0    1.0
1    1.0
2    4.0
3    4.0
dtype: float64

Installed Versions

This issue is confirmed on the main branch.

@rtpsw rtpsw added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 26, 2022
@rtpsw
Copy link
Contributor Author

rtpsw commented Jan 26, 2022

take

@jreback jreback added this to the 1.5 milestone Jan 27, 2022
@jreback jreback added Window rolling, ewma, expanding and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 27, 2022
@jreback jreback modified the milestones: 1.5, 1.4.1 Jan 27, 2022
rtpsw added a commit to rtpsw/pandas that referenced this issue Jan 27, 2022
rtpsw added a commit to rtpsw/pandas that referenced this issue Jan 27, 2022
rtpsw added a commit to rtpsw/pandas that referenced this issue Jan 28, 2022
meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this issue Jan 28, 2022
jreback pushed a commit that referenced this issue Jan 28, 2022
…elements (GH-45647) (#45683)

Co-authored-by: rtpsw <rtpsw@hotmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants