Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polars .get expression doesn't works when using within agg in group_by_dynamic #16116

Closed
2 tasks done
philurame opened this issue May 8, 2024 · 1 comment · Fixed by #16189
Closed
2 tasks done

Polars .get expression doesn't works when using within agg in group_by_dynamic #16116

philurame opened this issue May 8, 2024 · 1 comment · Fixed by #16189
Assignees
Labels
accepted Ready for implementation bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@philurame
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

df = pl.DataFrame({
    'time':pl.date_range(pl.date(2021,1,1), pl.date(2021,1,8), eager=True),
    'data': pl.arange(8, eager=True)
    })

df_grouped = df.group_by_dynamic(
    index_column="time", 
    every="2d", 
    period="3d",
    start_by='datapoint',
).agg(
    data=pl.col('data'),
    get_1_bad=pl.col('data').get(1),
    get_1_good=pl.col('data').shift(-1).get(0),
)

print(df, df_grouped)

Log output

┌────────────┬──────┐
│ time       ┆ data │
│ ---        ┆ ---  │
│ date       ┆ i64  │
╞════════════╪══════╡
│ 2021-01-01 ┆ 0    │
│ 2021-01-02 ┆ 1    │
│ 2021-01-03 ┆ 2    │
│ 2021-01-04 ┆ 3    │
│ 2021-01-05 ┆ 4    │
│ 2021-01-06 ┆ 5    │
│ 2021-01-07 ┆ 6    │
│ 2021-01-08 ┆ 7    │
└────────────┴──────┘
┌────────────┬───────────┬───────────┬────────────┐
│ time       ┆ data      ┆ get_1_bad ┆ get_1_good │
│ ---        ┆ ---       ┆ ---       ┆ ---        │
│ date       ┆ list[i64] ┆ i64       ┆ i64        │
╞════════════╪═══════════╪═══════════╪════════════╡
│ 2021-01-01 ┆ [0, 1, 2] ┆ 1         ┆ 1          │
│ 2021-01-03 ┆ [2, 3, 4] ┆ 2         ┆ 3          │
│ 2021-01-05 ┆ [4, 5, 6] ┆ 4         ┆ 5          │
│ 2021-01-07 ┆ [6, 7]    ┆ 5         ┆ 7          │
└────────────┴───────────┴───────────┴────────────┘

Issue description

Looks like .get expression works wrong if every!=period in group_by_dynamic.

shift(x), get(0) and get(-1) work correctly, so I can fix the issue with .shift(-1).get(0) but .get(x) in general does not work for some reason.

p.s. I originally posted the problem in the stackoverflow

Expected behavior

get_1_bad and get_1_good columns from my example must be the same.

Installed versions

--------Version info---------
Polars:               0.20.25
Index type:           UInt32
Platform:             Linux-5.15.0-70-generic-x86_64-with-glibc2.35
Python:               3.10.6 (main, May 29 2023, 11:10:38) [GCC 11.3.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          2.2.1
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               2023.10.0
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           3.7.2
nest_asyncio:         1.5.6
numpy:                1.26.1
openpyxl:             <not installed>
pandas:               2.1.4
pyarrow:              15.0.0
pydantic:             <not installed>
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           2.0.23
torch:                2.1.0+cu121
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@philurame philurame added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels May 8, 2024
@cmdlineluser
Copy link
Contributor

Can reproduce.

Slicing appears to be unaffected.

df.group_by_dynamic(
    index_column="time", 
    every="2d", 
    period="3d",
    start_by="datapoint",
).agg(
    data=pl.col("data"),
    get_1_gather=pl.col("data").gather(1),
    get_1_slice=pl.col("data").slice(1, 1),
)
shape: (4, 4)
┌────────────┬───────────┬──────────────┬─────────────┐
│ timedataget_1_gatherget_1_slice │
│ ------------         │
│ datelist[i64] ┆ list[i64]    ┆ list[i64]   │
╞════════════╪═══════════╪══════════════╪═════════════╡
│ 2021-01-01 ┆ [0, 1, 2] ┆ [1]          ┆ [1]         │
│ 2021-01-03 ┆ [2, 3, 4] ┆ [2]          ┆ [3]         │
│ 2021-01-05 ┆ [4, 5, 6] ┆ [4]          ┆ [5]         │
│ 2021-01-07 ┆ [6, 7]    ┆ [5]          ┆ [7]         │
└────────────┴───────────┴──────────────┴─────────────┘

I will just tag @MarcoGorelli in case they can take a look.

@ritchie46 ritchie46 self-assigned this May 13, 2024
@c-peters c-peters added the accepted Ready for implementation label May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants