Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unwanted removal of ISO8601 datetime string timezone information when reading from GeoJSON #914

Open
EigenJT opened this issue Jun 16, 2020 · 5 comments

Comments

@EigenJT
Copy link

EigenJT commented Jun 16, 2020

Bringing this over from geopandas. It seems like reading an ISO8601 datetime string from a GeoJSON erases the timezone information at some point.
geopandas/geopandas#1472

Expected behavior and actual behavior.

Expected to read in timezone aware ISO8601 datetime strings in a GeoJSON as strings. Got string that has been stripped of timezone information.

I.E. in the GeoJSON, we have 2010-10-20T12:00:00+07:00 but when read in using Fiona, got 2010-10-20T12:00:00.

Operating system

Mac OS X 10.15.5.

Fiona and GDAL version and provenance

Fiona: 1.8.13.post1 installed via Pypi
GDAL: 2.4.4 installed via Homebrew

@rbuffat
Copy link
Contributor

rbuffat commented Jun 16, 2020

Currently, Fiona supports RFC3339 and is not aware of timezones (= Ignoring potential timezone information from GDAL).

Read support could probably be implemented quite easily, writing datetimes with timezones could potential be more tricky.

GDAL' s documentation regarding timezone information is as follows:

OGR_F_GetFieldAsDateTimeEx: pnTZFlag: (0=unknown, 1=localtime, 100=GMT, see data model for details)
CPLParseRFC822DateTime: pnTZFlag:  (0=unknown, 100=GMT, 101=GMT+15minute, 99=GMT-15minute), or NULL

@EigenJT
Copy link
Author

EigenJT commented Jun 16, 2020

Currently, Fiona supports RFC3339 and is not aware of timezones (= Ignoring potential timezone information from GDAL).

Read support could probably be implemented quite easily, writing datetimes with timezones could potential be more tricky.

GDAL' s documentation regarding timezone information is as follows:

OGR_F_GetFieldAsDateTimeEx: pnTZFlag: (0=unknown, 1=localtime, 100=GMT, see data model for details)
CPLParseRFC822DateTime: pnTZFlag:  (0=unknown, 100=GMT, 101=GMT+15minute, 99=GMT-15minute), or NULL

Thanks for the response @rbuffat . I'm not too familiar with configuring GDAL. Would there be a way of disabling the parsing and just treating datetimes as regular strings?

@rbuffat
Copy link
Contributor

rbuffat commented Jun 16, 2020

For the GeoJSON driver, you can use the DATE_AS_STRING or OGR_GEOJSON_DATE_AS_STRING option:

fiona.open("test_file.geojson", DATE_AS_STRING=YES) as c:

In this case, the date fields are treated as string:

{'properties': OrderedDict([('datetime', 'str')]), 'geometry': 'Point'}

https://gdal.org/drivers/vector/geojson.html

@jorisvandenbossche
Copy link
Member

The datetime is created and formatted here:

Fiona/fiona/ogrext.pyx

Lines 242 to 251 in e46abf2

retval = OGR_F_GetFieldAsDateTime(
feature, i, &y, &m, &d, &hh, &mm, &ss, &tz)
try:
if fieldtype is FionaDateType:
props[key] = datetime.date(y, m, d).isoformat()
elif fieldtype is FionaTimeType:
props[key] = datetime.time(hh, mm, ss).isoformat()
else:
props[key] = datetime.datetime(
y, m, d, hh, mm, ss).isoformat()

So I think it should indeed relatively straightforward to calculate a GMT offset from the pnTZFlag.

For the original example (2010-10-20T12:00:00+07:00), you get back those values:

In [18]: feature.GetFieldAsDateTime(0) 
Out[18]: [2010, 10, 20, 12, 0, 0.0, 128]

So (128-100)*15 = 420, and then:

In [33]: tzinfo = datetime.timezone(datetime.timedelta(minutes=420)) 

In [35]: datetime.datetime(2010, 10, 20, 12, 0, tzinfo=tzinfo).isoformat() 
Out[35]: '2010-10-20T12:00:00+07:00'

gets back the correct string representation

@rbuffat
Copy link
Contributor

rbuffat commented Jun 19, 2020

The situation is a bit complicated, as setting the pnTZFlag gives a different result depending on the driver: OSGeo/gdal#2696

The timezone flag pnTZFlag of OGR_F_SetFieldDateTimeEx can either be 0=unknown or having one of the following values 100=GMT, 101=GMT+15minute, 99=GMT-15minute and so on.

The problem is, that not all drivers understand timezones. E.g. MapInfo File does not support timezones, GPKG driver until GDAL 3.1 supported only the GMT timezone (respectively all datetime are in GMT before GDAL 3.1 (0 = unknown is not supported)). Also setting the pnTZFlag to an unsupported mode does typically not result in an error.

I think we should apply the following logic:

  • If a driver supports a timezone, we should save the datetime values with the same timezone. This should allow persisting the same value when roundtripping (opening a dataset and writing it should preserve the values)
  • If a datetime with a timezone should be saved, it should be saved with the timezone when the driver supports timezones. When the driver does not support timezones or only UTC, we should convert the datetime to UTC.

Implementing this logic should not be that hard. However, as far as I know, it is not possible to know what a driver supports without testing it. The difficult part is to create tests that cover all cases. Some drivers convert datetime silently to string, which then can have arbitrary formatting, makes it even more messy to create tests.

Also, timezone support seems to be a bit lacking in Python without external libraries such as pytz. E.g. datetime.timezone was introduced with Python 3.2.

@sgillies sgillies added this to the 1.8.14 milestone Jul 4, 2020
@sgillies sgillies added the bug label Jul 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants