Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: failure if manually specifying engine="pyarrow" in to_parquet #214

Open
jorisvandenbossche opened this issue Aug 12, 2022 · 1 comment

Comments

@jorisvandenbossche
Copy link
Member

I just noticed that when the argument engine="pyarrow" is provided to to_parquet() the write still fails with the same error.

import pandas as pd
import geopandas as gpd
import dask_geopandas as dgpd

dft = pd.util.testing.makeDataFrame()
dft["geometry"] = gpd.points_from_xy(dft.A, dft.B)
df = gpd.GeoDataFrame(dft)
df = dgpd.from_geopandas(df, npartitions=1)
df.to_parquet("mydf.parquet", engine="pyarrow")

Originally posted by @FlorisCalkoen in #198 (comment)

@jorisvandenbossche
Copy link
Member Author

Ah, that is "expected", because you are then using dask's built-in "pyarrow" engine, and we actually extend that engine to handle the geometry dtype properly.

But of course, we should avoid that people can accidentally pass engine="pyarrow" and thus silently overwriting our own engine. Seems we need something more elaborate that the simple partial to do that:

to_parquet = partial(dd.to_parquet, engine=GeoArrowEngine)
to_parquet.__doc__ = dd.to_parquet.__doc__

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant