requirements: use pandas >= 2 #128

clbarnes · 2023-11-03T15:50:10Z

If pandas 2 works without much effort, we should use it as it can be much more efficient and sane with the arrow backend. If it doesn't work, we should constrain the install requirements.

clbarnes · 2023-11-03T16:01:20Z

Looks like only 1 test failure, in fairly recent code and er... >2000 warnings. Probably from hitting the same code over and over.

clbarnes · 2023-11-14T14:48:28Z

At present, navis works with both pandas 1 and 2. Locking to 2 would enable some optimisations (e.g. using the arrow backend has some speed and memory advantages), but could possibly break some downstream users' workflows. Of course, they are welcome to use an older navis version (the last pandas1-compatible version should be clearly signposted).

Implementing the optimisations conditionally based on which pandas version was installed would be a headache.

schlegelp · 2024-01-09T18:08:22Z

I don't have any strong opinions in this case. I've been using pandas >=2 for a while and haven't had any major issues - neither with navis nor any other adjacent packages. All things being equal, I'd lean towards being flexible but I guess the last 1.x.x version of pandas is now almost a year old.

Did you have anything specific in mind re speed and memory advantages?

clbarnes · 2024-01-15T11:21:23Z

I suppose there's no rush on this while everything is still working, but I think that if an incompatibility were to arise, pandas 1 should probably be put on the chopping block before too much effort is expended accounting for both.

Pandas' integration with arrow is a work in progress but is already faster for a number of operations. I think at the moment, pandas would just use arrow instead of numpy arrays for columns rather than directly wrapping an arrow table; I also think it doesn't directly support arrow map, struct, and list types for now (it just turns them into python dicts, dicts, and lists).

The advantage is probably more theoretical for now.

requirements: use pandas >= 2

e014611

clbarnes force-pushed the pandas2 branch from e014611 to 1a6bdc3 Compare November 14, 2023 14:34

clbarnes marked this pull request as ready for review November 14, 2023 14:43

clbarnes force-pushed the pandas2 branch from 1a6bdc3 to e014611 Compare January 5, 2024 17:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

requirements: use pandas >= 2 #128

requirements: use pandas >= 2 #128

clbarnes commented Nov 3, 2023

clbarnes commented Nov 3, 2023 •

edited

clbarnes commented Nov 14, 2023 •

edited

schlegelp commented Jan 9, 2024

clbarnes commented Jan 15, 2024

requirements: use pandas >= 2 #128

Are you sure you want to change the base?

requirements: use pandas >= 2 #128

Conversation

clbarnes commented Nov 3, 2023

clbarnes commented Nov 3, 2023 • edited

clbarnes commented Nov 14, 2023 • edited

schlegelp commented Jan 9, 2024

clbarnes commented Jan 15, 2024

clbarnes commented Nov 3, 2023 •

edited

clbarnes commented Nov 14, 2023 •

edited