-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use_arrow=True vs False: different handling of date columns from shapefiles #262
Comments
Script to reproduce both aspects + how fiona handles this case: fiona formats dates as string.
(relevant) output:
|
In geofileops I'm thinking about dealing with it as follows, so convert these columns to datetime64:
|
My personal view in all of these data type issues is that in the long term, it makes sense to adopt |
What I suppose is happening here, is that we get an arrow table with an actual date column (arrow has a date32 and date64 data types). And then in I don't think I have a strong opinion on what option is best (datetime.date vs datetime64[ns]). Ideally in the near future, pandas will actually natively support a "date" data type, and then that would also solve this question. (I do agree that longer term, we hopefully can just start relying on the ArrowStream-based interface of GDAL) |
Ah right. That reminded me of #241 (comment), so in theory you could use |
You could indeed use |
If use_arrow=False, the default, a date column is returned as dtype "datetime64".
If use_arrow=True, a date column is returned as dtype "object" with datetime.date objects as data values.
Not sure what is the way to go, but probably it would be better if the behaviour is the same.
Additional complication: columns of type "datetime64" are written by pyogrio to a column of Date type (tested with shapefile), while "object" columns with datetime.date values seem to be written as a "String" column.
The text was updated successfully, but these errors were encountered: