Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic when doing an inner join on two empty DFs #16140

Closed
2 tasks done
max-muoto opened this issue May 9, 2024 · 0 comments · Fixed by #16181
Closed
2 tasks done

Panic when doing an inner join on two empty DFs #16140

max-muoto opened this issue May 9, 2024 · 0 comments · Fixed by #16181
Assignees
Labels
accepted Ready for implementation bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@max-muoto
Copy link
Contributor

max-muoto commented May 9, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

df1 = pl.DataFrame({"col1": [], "col2": [], "col3": []})
df2 = pl.DataFrame({"col2": [], "col4": [], "col5": []})

df1.join(df2, on="col2", how="inner")

Issue description

When trying to do an inner join on two empty DFs, you'll run into a panic. Personally would expect an empty DF with all columns to be returned. Here's a quick example in Colab: https://colab.research.google.com/drive/1q8HS7oX7f-LZzgbvnuGDeG8ZbPq0wj43?usp=sharing. However I also tested this locally on 0.20.25.

Expected behavior

import polars as pl
df1 = pl.DataFrame({"col1": [], "col2": [], "col3": []})
df2 = pl.DataFrame({"col2": [], "col4": [], "col5": []})
resulting_df = df1.join(df2, on="col2", how="inner")

In this case, resulting_df would be an empty DF with col1, col2, col3, col4, and col5.

Installed versions

--------Version info---------
Polars:               0.20.25
Index type:           UInt32
Platform:             macOS-14.4.1-arm64-arm-64bit
Python:               3.11.7 | packaged by conda-forge | (main, Dec 23 2023, 14:38:07) [Clang 16.0.6 ]

----Optional dependencies----
adbc_driver_manager:  0.11.0
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           <not installed>
nest_asyncio:         1.6.0
numpy:                1.26.1
openpyxl:             3.1.2
pandas:               2.2.0
pyarrow:              14.0.2
pydantic:             2.7.1
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           <not installed>
torch:                <not installed>
xlsx2csv:             0.8.2
xlsxwriter:           3.1.9```

</details>
@max-muoto max-muoto added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels May 9, 2024
@c-peters c-peters added the accepted Ready for implementation label May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants