Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot write geodataframe with non-sequential index to shapefile due to KeyError #338

Open
codeananda opened this issue Jan 22, 2024 · 6 comments

Comments

@codeananda
Copy link

codeananda commented Jan 22, 2024

If I try to write a gdf with a non-sequential index to shapefile using pyogrio, I sometimes get a KeyError. For some reason it occurs if I have a datetime column but if I remove it, it goes away.

Resetting the index before writing also solves the problem. But it's strange that I would need to do this.

Reproducible example

from shapely import wkt
import pandas as pd
import geopandas as gpd

data = [
    {"OBJECTID": 1, "CODE": 5, "NAME": "NEW FOREST", "MEASURE": 567.0, "DESIG_DATE": "2006-04-01 00:00:00+00:00", "geometry": wkt.loads("POINT (0 0)")},
    {"OBJECTID": 8, "CODE": 10, "NAME": "SOUTH DOWNS", "MEASURE": 1653.0, "DESIG_DATE": "2010-03-31 00:00:00+00:00", "geometry": wkt.loads("POINT (1 1)")}
]
a = gpd.GeoDataFrame(data, geometry='geometry', index=[0,7])
a['DESIG_DATE'] = pd.to_datetime(a['DESIG_DATE'])  # comment out this line and it works
a.to_file('aaa.shp', engine='pyogrio')

It also warns me that DESIG_DATE is created as a date even though DateTime was requested

Warning

C:\Users\User\AppData\Local\pypoetry\Cache\virtualenvs\big-bertha-O8kHtzvf-py3.10\lib\site-packages\pyogrio\raw.py:530: RuntimeWarning: Field DESIG_DATE create as date field, though DateTime requested.
  ogr_write(

Stacktrace

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~\AppData\Local\pypoetry\Cache\virtualenvs\big-bertha-O8kHtzvf-py3.10\lib\site-packages\pandas\core\indexes\base.py:3791, in Index.get_loc(self, key)
   3790 try:
-> 3791     return self._engine.get_loc(casted_key)
   3792 except KeyError as err:

File index.pyx:152, in pandas._libs.index.IndexEngine.get_loc()

File index.pyx:181, in pandas._libs.index.IndexEngine.get_loc()

File pandas\_libs\hashtable_class_helper.pxi:2606, in pandas._libs.hashtable.Int64HashTable.get_item()

File pandas\_libs\hashtable_class_helper.pxi:2630, in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 1

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[67], line 11
      9 a['DESIG_DATE'] = pd.to_datetime(a['DESIG_DATE'])
     10 # a.info()
---> 11 a.to_file('aaa.shp', engine='pyogrio')

File ~\AppData\Local\pypoetry\Cache\virtualenvs\big-bertha-O8kHtzvf-py3.10\lib\site-packages\geopandas\geodataframe.py:1264, in GeoDataFrame.to_file(self, filename, driver, schema, index, **kwargs)
   1173 """Write the ``GeoDataFrame`` to a file.
   1174 
   1175 By default, an ESRI shapefile is written, but any OGR data source
   (...)
   1260 
   1261 """
   1262 from geopandas.io.file import _to_file
-> 1264 _to_file(self, filename, driver, schema, index, **kwargs)

File ~\AppData\Local\pypoetry\Cache\virtualenvs\big-bertha-O8kHtzvf-py3.10\lib\site-packages\geopandas\io\file.py:614, in _to_file(df, filename, driver, schema, index, mode, crs, engine, **kwargs)
    612     _to_file_fiona(df, filename, driver, schema, crs, mode, **kwargs)
    613 elif engine == "pyogrio":
--> 614     _to_file_pyogrio(df, filename, driver, schema, crs, mode, **kwargs)
    615 else:
    616     raise ValueError(f"unknown engine '{engine}'")

File ~\AppData\Local\pypoetry\Cache\virtualenvs\big-bertha-O8kHtzvf-py3.10\lib\site-packages\geopandas\io\file.py:662, in _to_file_pyogrio(df, filename, driver, schema, crs, mode, **kwargs)
    659 if not df.columns.is_unique:
    660     raise ValueError("GeoDataFrame cannot contain duplicated column names.")
--> 662 pyogrio.write_dataframe(df, filename, driver=driver, **kwargs)

File ~\AppData\Local\pypoetry\Cache\virtualenvs\big-bertha-O8kHtzvf-py3.10\lib\site-packages\pyogrio\geopandas.py:548, in write_dataframe(df, path, layer, driver, encoding, geometry_type, promote_to_multi, nan_as_null, append, dataset_metadata, layer_metadata, metadata, dataset_options, layer_options, **kwargs)
    545 if geometry_column is not None:
    546     geometry = to_wkb(geometry.values)
--> 548 write(
    549     path,
    550     layer=layer,
    551     driver=driver,
    552     geometry=geometry,
    553     field_data=field_data,
    554     field_mask=field_mask,
    555     fields=fields,
    556     crs=crs,
    557     geometry_type=geometry_type,
    558     encoding=encoding,
    559     promote_to_multi=promote_to_multi,
    560     nan_as_null=nan_as_null,
    561     append=append,
    562     dataset_metadata=dataset_metadata,
    563     layer_metadata=layer_metadata,
    564     metadata=metadata,
    565     dataset_options=dataset_options,
    566     layer_options=layer_options,
    567     gdal_tz_offsets=gdal_tz_offsets,
    568     **kwargs,
    569 )

File ~\AppData\Local\pypoetry\Cache\virtualenvs\big-bertha-O8kHtzvf-py3.10\lib\site-packages\pyogrio\raw.py:530, in write(path, geometry, field_data, fields, field_mask, layer, driver, geometry_type, crs, encoding, promote_to_multi, nan_as_null, append, dataset_metadata, layer_metadata, metadata, dataset_options, layer_options, gdal_tz_offsets, **kwargs)
    527         else:
    528             raise ValueError(f"unrecognized option '{k}' for driver '{driver}'")
--> 530 ogr_write(
    531     path,
    532     layer=layer,
    533     driver=driver,
    534     geometry=geometry,
    535     geometry_type=geometry_type,
    536     field_data=field_data,
    537     field_mask=field_mask,
    538     fields=fields,
    539     crs=crs,
    540     encoding=encoding,
    541     promote_to_multi=promote_to_multi,
    542     nan_as_null=nan_as_null,
    543     append=append,
    544     dataset_metadata=dataset_metadata,
    545     layer_metadata=layer_metadata,
    546     dataset_kwargs=dataset_kwargs,
    547     layer_kwargs=layer_kwargs,
    548     gdal_tz_offsets=gdal_tz_offsets,
    549 )

File ~\AppData\Local\pypoetry\Cache\virtualenvs\big-bertha-O8kHtzvf-py3.10\lib\site-packages\pyogrio\_io.pyx:2039, in pyogrio._io.ogr_write()
   2037     gdal_tz = 0
   2038 else:
-> 2039     gdal_tz = tz_array[i]
   2040 OGR_F_SetFieldDateTimeEx(
   2041     ogr_feature,

File ~\AppData\Local\pypoetry\Cache\virtualenvs\big-bertha-O8kHtzvf-py3.10\lib\site-packages\pandas\core\series.py:1040, in Series.__getitem__(self, key)
   1037     return self._values[key]
   1039 elif key_is_scalar:
-> 1040     return self._get_value(key)
   1042 # Convert generator to list before going through hashable part
   1043 # (We will iterate through the generator there to check for slices)
   1044 if is_iterator(key):

File ~\AppData\Local\pypoetry\Cache\virtualenvs\big-bertha-O8kHtzvf-py3.10\lib\site-packages\pandas\core\series.py:1156, in Series._get_value(self, label, takeable)
   1153     return self._values[label]
   1155 # Similar to Index.get_value, but we do not fall back to positional
-> 1156 loc = self.index.get_loc(label)
   1158 if is_integer(loc):
   1159     return self._values[loc]

File ~\AppData\Local\pypoetry\Cache\virtualenvs\big-bertha-O8kHtzvf-py3.10\lib\site-packages\pandas\core\indexes\base.py:3798, in Index.get_loc(self, key)
   3793     if isinstance(casted_key, slice) or (
   3794         isinstance(casted_key, abc.Iterable)
   3795         and any(isinstance(x, slice) for x in casted_key)
   3796     ):
   3797         raise InvalidIndexError(key)
-> 3798     raise KeyError(key) from err
   3799 except TypeError:
   3800     # If we have a listlike key, _check_indexing_error will raise
   3801     #  InvalidIndexError. Otherwise we fall through and re-raise
   3802     #  the TypeError.
   3803     self._check_indexing_error(key)

KeyError: 1
@jorisvandenbossche
Copy link
Member

Can you show the output of geopandas.show_versions()?

@codeananda
Copy link
Author

SYSTEM INFO
-----------
python     : 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
executable : C:\Users\User\AppData\Local\pypoetry\Cache\virtualenvs\big-bertha-O8kHtzvf-py3.10\Scripts\python.exe
machine    : Windows-10-10.0.22621-SP0

GEOS, GDAL, PROJ INFO
---------------------
GEOS       : 3.11.2
GEOS lib   : None
GDAL       : 3.6.4
GDAL data dir: C:\Users\User\AppData\Local\pypoetry\Cache\virtualenvs\big-bertha-O8kHtzvf-py3.10\lib\site-packages\fiona\gdal_data
PROJ       : 9.3.0
PROJ data dir: C:\Users\User\AppData\Local\pypoetry\Cache\virtualenvs\big-bertha-O8kHtzvf-py3.10\lib\site-packages\pyproj\proj_dir\share\proj

PYTHON DEPENDENCIES
-------------------
geopandas  : 0.14.1
numpy      : 1.26.2
pandas     : 2.1.4
pyproj     : 3.6.1
shapely    : 2.0.2
fiona      : 1.9.5
geoalchemy2: None
geopy      : None
matplotlib : 3.8.2
mapclassify: None
pygeos     : None
pyogrio    : 0.7.2
psycopg2   : 2.9.9 (dt dec pq3 ext lo64)
pyarrow    : None
rtree      : 1.1.0

@theroggy
Copy link
Member

I noticed this as well, and it has been fixed but not released yet: #324

@jorisvandenbossche
Copy link
Member

Ah, that's the reason I didn't see it (was testing with main), was just going to look at our recent commits. Thanks for the link.

@codeananda
Copy link
Author

Wonderful :) any idea when this will be released?

@brendan-ward
Copy link
Member

We haven't set a specific date for the next release, but it would be ideal to have it out in the next couple of weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants