Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: .loc Indexing with pyarrow backed DatetimeIndex does not allow string comparison #58307

Closed
3 tasks done
WillAyd opened this issue Apr 18, 2024 · 4 comments
Closed
3 tasks done
Labels
Arrow pyarrow functionality Bug Timeseries

Comments

@WillAyd
Copy link
Member

WillAyd commented Apr 18, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

In [18]: df = pd.DataFrame({
    ...:     "data": range(3),
    ...:     "pa_dates": pd.Series(["2024-01-01", "2024-01-02", "2024-01-03"], dtype=pd.ArrowDtype(pa.timestamp("s"))),
    ...:     "pd_dates": pd.Series(["2024-01-01", "2024-01-02", "2024-01-03"], dtype="datetime64[s]"),
    ...: })
    ...: 

In [19]: df.set_index("pd_dates").loc[:"2024-01-01"]
    ...: 
Out[19]: 
            data             pa_dates
pd_dates                             
2024-01-01     0  2024-01-01 00:00:00

In [20]: df.set_index("pa_dates").loc[:"2024-01-01"]
TypeError: '<' not supported between instances of 'datetime.datetime' and 'str'


### Issue Description

TypeError: '<' not supported between instances of 'datetime.datetime' and 'str'

### Expected Behavior

I would expect the pyarrow backed datetimeindex to be supported just like the pandas one

### Installed Versions

In [21]: pd.__version__
Out[21]: '3.0.0.dev0+681.g434fda08cf'
@WillAyd WillAyd added Bug Timeseries Arrow pyarrow functionality labels Apr 18, 2024
@SarthakNikhal
Copy link

@WillAyd Can I work on this issue? I did not know arrow is supposed to work with pandas....

@WillAyd
Copy link
Member Author

WillAyd commented Apr 18, 2024

@jbrockmendel I see from the linked PR that you don't think this should be supported. Curious to hear your reasons why before anyone works on it.

pa.timestamp and pd.Timestamp share a lot of the same architecture (save missing value handling) so I think converting from one to another should be seamless. I'm definitely sympathetic to not trying to make everything they do the same, although indexing like this feels useful. I do think part of the confusion here could stem from the fact that we do not have a logical "timestamp" type, only physical

I came across this reading a CSV file with the pyarrow backend, which was great for strings but then didn't help me at all with timestamps.

@jbrockmendel
Copy link
Member

I see from the linked PR that you don't think this should be supported. Curious to hear your reasons why before anyone works on it.

I linked this there bc the expectation that this would already work bc the only difference is in how the DatetimeIndex is "backed" is incorrect (the Index in question is not a DatetimeIndex at all). I think part of that incorrect expectation is linked to the term "backend" and hope that using a more accurate term might avoid misconceptions.

As to the actual behavior, I'd be fine with it, though I expect the implementation would be very messy to the point where saying "convert to a DatetimeIndex" might be easier.

@WillAyd
Copy link
Member Author

WillAyd commented Apr 29, 2024

Looks like this is actually a duplicate of #53154 - closing to keep the discussion there

@WillAyd WillAyd closed this as completed Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Bug Timeseries
Projects
None yet
Development

No branches or pull requests

3 participants