New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Promote Python datetime and timedelta objects to relevant types in broadcasting #3109
Comments
The ufunc logic first passes through >>> from datetime import datetime
>>> ak.behavior["__cast__", datetime] = lambda x: ak.to_layout(x)
>>> array - datetime.now()
<Array [-137324468250000 microseconds, ...] type='5 * timedelta64[us
]'> It occurs to me that we might want But, I digress. Casting is a "solution" here, but the question really remains about expectations. My instinct is that this should require coercion because |
By "casting," I meant a reinterpret-cast. And yeah, no timezones in NumPy. One of the things that make Python datetimes != NumPy datetimes != Arrow datetimes != Pandas datetimes. Some timezone would have to be assumed, and it shouldn't be locale-dependent, otherwise code that works on a computer in the U.S. wouldn't work in Europe. (Or wouldn't work on the same laptop, after a transatlantic flight!) Right now, what the code does is it takes the awkward/awkward-cpp/src/python/content.cpp Lines 38 to 44 in ecb41df
So if the >>> from datetime import datetime
>>> from pytz import timezone
>>> obj = datetime(2011, 8, 15, 8, 15, 12, 0, tzinfo=timezone("US/Central")) the above code would try to subtract a timezone-naive >>> obj - datetime(1970, 1, 1, 0, 0, 0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can't subtract offset-naive and offset-aware datetimes which is a Python error, uncaught in pybind11 code, and so I think that would cause a segfault. Nope! It's fine—no segfault: >>> ak.from_iter([{"a": [{"b": [{"c": [obj, obj, obj]}]}]}])
Traceback (most recent call last):
File "/home/jpivarski/irishep/awkward/src/awkward/_dispatch.py", line 39, in dispatch
gen_or_result = func(*args, **kwargs)
File "/home/jpivarski/irishep/awkward/src/awkward/operations/ak_from_iter.py", line 70, in from_iter
return _impl(iterable, highlevel, behavior, allow_record, initial, resize, attrs)
File "/home/jpivarski/irishep/awkward/src/awkward/operations/ak_from_iter.py", line 100, in _impl
builder.fromiter(iterable)
TypeError: can't subtract offset-naive and offset-aware datetimes
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jpivarski/irishep/awkward/src/awkward/_dispatch.py", line 38, in dispatch
with OperationErrorContext(name, args, kwargs):
File "/home/jpivarski/irishep/awkward/src/awkward/_errors.py", line 85, in __exit__
self.handle_exception(exception_type, exception_value)
File "/home/jpivarski/irishep/awkward/src/awkward/_errors.py", line 95, in handle_exception
raise self.decorate_exception(cls, exception)
TypeError: can't subtract offset-naive and offset-aware datetimes
This error occurred while calling
ak.from_iter(
[{'a': [{'b': [{'c': [datetime.datetime(2011, 8, 15, 8, 15, 12, tzinf...
) I don't see where the Python exception catching happens and gets propagated correctly, but it gets handled somewhere. I think this isn't a bad error to get. NumPy datetimes, our internal format, are timezone-naive. If you try to use timezone-aware datetimes, this error is the best thing that can happen. Anything else, such as assuming a locale or assuming UTC, would be a subtle error. We can just say that we don't support timezone-aware datetimes. |
Description of new feature
Problem
Suppose you have an array of datetimes:
You can broadcast it with other NumPy datetimes:
but you can't broadcast it with Python datetimes:
Now, NumPy itself doesn't do this:
and we don't want to get into the many time formats of Arrow, Pandas, and the rest, but since datetime64 and timedelta64 are among Awkward's primitive types, you'd think (users are likely to think) that the corresponding Python objects should be promoted.
Solution
The traceback ended here:
awkward/src/awkward/_nplikes/array_module.py
Lines 212 to 221 in ecb41df
(a helper function that assumes all of the primitive types are booleans, integers, and (real & complex) floating point numbers, leaving out datetimes and timedeltas). But that's not the only issue. In order to do this right and promote Pythonic data containing temporal data,
from_iter
has to recognize temporal data (it does),to_layout
, which it does:Okay, I had thought that there would be a lot of work to do, but most of this looks like it's already implemented. Maybe it's just one naive helper function, and generalizing that one function could completely fix this. If that's true, then this isn't a big project, but a small project. Also, this wouldn't be a feature but a bug-fix. I'll change the label now.
Cc: @alexander-held, who brought this up on Slack.
The text was updated successfully, but these errors were encountered: