Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'dask_geopandas.from_dask_dataframe' produces error: 'DataFrame' object has no attribute 'map_partitions' #221

Open
komzy opened this issue Oct 12, 2022 · 1 comment

Comments

@komzy
Copy link

komzy commented Oct 12, 2022

I'm writing a simple code to read a large geojson file (>3 GB) into dask and convert to dask-geopandas dataframe. However I run into the above error.

Here's my code:

import pandas as pd
import geopandas as gpd
from shapely.geometry import LineString
import dask_geopandas
import dask.dataframe as dd

dask_df = dd.read_json('madagascar_gen.txt',orient='list').compute()
dgpd = dask_geopandas.from_dask_dataframe(dask_df, geometry="geometry")

Error log:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [18], in <cell line: 1>()
----> 1 dgpd = dask_geopandas.from_dask_dataframe(dask_df, geometry="geometry")

File ~/opt/anaconda3/lib/python3.9/site-packages/dask_geopandas/core.py:790, in from_dask_dataframe(df, geometry)
    786     name = geometry.name if geometry.name is not None else "geometry"
    787     return df.assign(**{name: geometry}).map_partitions(
    788         geopandas.GeoDataFrame, geometry=name
    789     )
--> 790 return df.map_partitions(geopandas.GeoDataFrame, geometry=geometry)

File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/generic.py:5575, in NDFrame.__getattr__(self, name)
   5568 if (
   5569     name not in self._internal_names_set
   5570     and name not in self._metadata
   5571     and name not in self._accessors
   5572     and self._info_axis._can_hold_identifiers_and_holds_name(name)
   5573 ):
   5574     return self[name]
-> 5575 return object.__getattribute__(self, name)

AttributeError: 'DataFrame' object has no attribute 'map_partitions'

madagascar_gen.json:

[
{"geometry":{"coordinates":[44.3207501,-20.290752],"type":"Point"},"type":"Feature","properties":{"oID":"1","timestamp":"2022-09-02 11:05:44"}},
{"geometry":{"coordinates":[44.32089653504225,-20.290709591647275],"type":"Point"},"type":"Feature","properties":{"oID":"1","timestamp":"2022-09-02 11:05:44"}},
{"geometry":{"coordinates":[44.32104297004467,-20.290667183294346],"type":"Point"},"type":"Feature","properties":{"oID":"1","timestamp":"2022-09-02 11:05:44"}},
...
]

Anyone know why this is happening?

@martinfleis
Copy link
Member

You are not passing a dask.dataframe to dask_geopandas.from_dask_dataframe. When you call compute(), dask computes the task graph and returns a pandas dataframe. The code above should be like this if you want to read with dask.dataframe:

dask_df = dd.read_json('madagascar_gen.txt',orient='list')
dgpd = dask_geopandas.from_dask_dataframe(dask_df, geometry="geometry")

But given the file is geojson, you will need to create geometry array yourself. The better option would be to read directly with dask-geopandas.

dgpd = dask_geopandas.read_file("madagascar_gen.json", npartitions=4)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants