New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rolling after sort returns weird and different outputs #16145
Comments
Can reproduce. from datetime import datetime
import polars as pl
df = pl.DataFrame({
"id": [1, 2],
"time": [
datetime(year=1989, month=12, day=1, hour=12, minute=3),
datetime(year=1989, month=12, day=1, hour=13, minute=14),
]
})
(df.sort("id").rolling(
index_column="time",
group_by="id",
period="1d",
offset="0d",
closed='right',
).agg().select("id")
) shape: (2, 1)
┌─────────────────────┐
│ id │
│ --- │
│ i64 │
╞═════════════════════╡
│ 2 │
│ 3539883390149865530 │
└─────────────────────┘ It seems to be related to So it appears to be an issue when offset is "positive". |
interesting - @cmdlineluser when I use your example I get: In [1]: from datetime import datetime
...: import polars as pl
...:
...: df = pl.DataFrame({
...: "id": [1, 2],
...: "time": [
...: datetime(year=1989, month=12, day=1, hour=12, minute=3),
...: datetime(year=1989, month=12, day=1, hour=13, minute=14),
...: ]
...: })
...:
...: (df.sort("id").rolling(
...: index_column="time",
...: group_by="id",
...: period="1d",
...: offset="0d",
...: closed='right',
...: ).agg().select("id")
...: )
Out[1]:
shape: (2, 1)
┌─────┐
│ id │
│ --- │
│ i64 │
╞═════╡
│ 2 │
│ 0 │
└─────┘ which versions are you using? which OS? |
I'm using I should have included that it is non-deterministic, sometimes I get |
The call stack goes like this: polars/crates/polars-time/src/group_by/dynamic.rs Lines 555 to 558 in d11da5e
polars/crates/polars-core/src/frame/group_by/mod.rs Lines 55 to 57 in f992a7a
polars/crates/polars-core/src/series/implementations/mod.rs Lines 187 to 190 in 23791bd
polars/crates/polars-core/src/frame/group_by/into_groups.rs Lines 140 to 148 in 19f0939
polars/crates/polars-core/src/frame/group_by/into_groups.rs Lines 59 to 62 in 19f0939
|
pinging @ritchie46 for visibility |
It's probably unrelated to the underlying issue @MarcoGorelli - but I noticed this while trying to figure out why code.pydf = pl.DataFrame({"foo": [1] * 3}).sort("foo").with_row_index()
df.rolling(index_column="foo", period="1i", offset="0i").agg("index")
# shape: (3, 2)
# ┌─────┬───────────┐
# │ foo ┆ index │
# │ --- ┆ --- │
# │ i64 ┆ list[u32] │
# ╞═════╪═══════════╡
# │ 1 ┆ [] │
# │ 1 ┆ [] │
# │ 1 ┆ [] │
# └─────┴───────────┘
df.rolling(index_column="foo", period="1i", offset="-0i").agg("index")
# shape: (3, 2)
# ┌─────┬───────────┐
# │ foo ┆ index │
# │ --- ┆ --- │
# │ i64 ┆ list[u32] │
# ╞═════╪═══════════╡
# │ 1 ┆ [0, 1, 2] │
# │ 1 ┆ [1, 2] │
# │ 1 ┆ [2] │
# └─────┴───────────┘ Edit: Separate issue created - apologies for the noise. |
thanks - could you open a separate issue about that one please? |
Taking a look. |
Checks
Reproducible example
Log output
group_by keys are sorted; running sorted key fast path
Issue description
Running the above snippet returns a strange output and it is different each time the same snippet is run. I noticed that calling sort, regardless if the dataframe is already sorted, causes this issue when running rolling. In the above snippet, I have verified that
df
is the same before and after sort (viafrom polars.testing import assert_frame_equal
).Should return:
Instead, here are a few examples of returns (almost seems random):
Expected behavior
Should return:
Removing
df = df.sort(by=["id", "time"])
provides the correct return.Installed versions
The text was updated successfully, but these errors were encountered: