QST: Why is subtracting pandas.timedelta from pandas.date not vectorized? #58315

MinaMirz · 2024-04-18T17:14:15Z

Research

I have searched the [pandas] tag on StackOverflow for similar questions.
I have asked my usage related question on StackOverflow.

Link to question on StackOverflow

https://stackoverflow.com/questions/78349257/why-is-subtracting-pandas-timedelta-from-pandas-date-not-vectorized

Question about pandas

No response

WillAyd · 2024-04-18T19:38:49Z

df["day"] = df["t1"].dt.date

^ this returns an dtype=object column contain Python datetime.date objects. dtype=object type columns are loosely typed and not vectorizable

You would either have to live with df["day"] being a datetime instead of a date, or alternately use pyarrow types for a stricter differentiation between date / datetime.

I'm not 100% clear on what your SO question is trying to accomplish since the timedelta you are constructing measures nanosecond differences, but I think this is what you are after:

import pandas as pd
import pyarrow as pa

s1 = pd.DataFrame({"year": [2015, 2016], "month": [2, 3], "day": [4, 5]})
s1 = pd.to_datetime(s1)

df = pd.DataFrame(s1)
df = df.rename(columns={0: "t1", 1: "t2"})
df["day"] = df["t1"].astype(pd.ArrowDtype(pa.date32()))
df["n"] = pd.to_timedelta(df["t1"].dt.day_of_week)

df["week"] = df["day"] - df["n"]

@jbrockmendel for any other guidance

jbrockmendel · 2024-04-18T20:56:15Z

id suggest using a Period[D] dtype. I'd be open to making obj.dt.date do that.

WillAyd · 2024-04-18T21:24:42Z

That makes sense from purely a pandas perspective. I think the downside is when you start talking about I/O (thinking especially with databases where DATE / TIMESTAMP are usually distinct types) I'm not sure how proper our Period support would be. With tools like ADBC the arrow types are already accounted for.

Not going to solve that issue in this issue per se - just food for larger thought

jbrockmendel · 2024-04-18T22:46:32Z

I'm pretty sure you've mentioned concerns like that before. how difficult would it be to make Period[D] work like you expect with a database? is that concern a show-stopper for many users?

WillAyd · 2024-04-18T23:07:43Z

Not sure. To be honest I don't know a ton of the internals on that - I'm sure its possible but I just question if its worth the effort when its already been done by pyarrow.

FWIW using dtype_backend="pyarrow" with read_csv will return dates as date32 already, so that would be something else we'd have to wire into the parsers. date32 is also exclusively a date type; I suppose a period could represent more things that we would have to handle when serializing outwards (ex: Period("D") may make sense for a DATE database type, but what about Period("Q")?)

jbrockmendel · 2024-04-18T23:32:28Z

when its already been done by pyarrow.

IIUC the suggestion you are implicitly making (here and in #58220) is to have obj.dt.date return with date32[pyarrow] dtype. The trouble with this is 1) pyarrow is not required and 2) it would give users mixed-and-matched null-propagation semantics, which we agreed we needed to avoid when implementing the hybrid string dtype. So for the foreseeable future i just don't see that as a viable option. Period[D] is our de facto date dtype (there has been discussion of making a DateDtype as a thin wrapper around this, but im not finding it on the tracker).

FWIW converting a Period[D] PeriodArray to date32[pyarrow] can be done with:

i4vals = arr.view("i8").astype("int32")
dt32 = pa.array(i4vals, type="date32")

(assuming the astype to int32 doesn't overflow)

WillAyd · 2024-04-19T00:33:37Z

Right now I think ser.dt.date should only return a pa.date32 if the series is a pa.timestamp. I agree I don't want to mix those systems, so I see your point about that returning a period when the call is a datetime64.

Im +/- 0 on that versus encouraging more arrow date / timestamp usage

MinaMirz added Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Apr 18, 2024

WillAyd added Timeseries and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 19, 2024

WillAyd mentioned this issue Apr 28, 2024

PDEP-13: The pandas Logical Type System #58455

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QST: Why is subtracting pandas.timedelta from pandas.date not vectorized? #58315

QST: Why is subtracting pandas.timedelta from pandas.date not vectorized? #58315

MinaMirz commented Apr 18, 2024

WillAyd commented Apr 18, 2024 •

edited

jbrockmendel commented Apr 18, 2024

WillAyd commented Apr 18, 2024

jbrockmendel commented Apr 18, 2024

WillAyd commented Apr 18, 2024

jbrockmendel commented Apr 18, 2024

WillAyd commented Apr 19, 2024

QST: Why is subtracting pandas.timedelta from pandas.date not vectorized? #58315

QST: Why is subtracting pandas.timedelta from pandas.date not vectorized? #58315

Comments

MinaMirz commented Apr 18, 2024

Research

Link to question on StackOverflow

Question about pandas

WillAyd commented Apr 18, 2024 • edited

jbrockmendel commented Apr 18, 2024

WillAyd commented Apr 18, 2024

jbrockmendel commented Apr 18, 2024

WillAyd commented Apr 18, 2024

jbrockmendel commented Apr 18, 2024

WillAyd commented Apr 19, 2024

WillAyd commented Apr 18, 2024 •

edited