Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop pandas dependency #147

Open
bmschmidt opened this issue Sep 10, 2021 · 0 comments
Open

Drop pandas dependency #147

bmschmidt opened this issue Sep 10, 2021 · 0 comments

Comments

@bmschmidt
Copy link
Member

For once #145 is complete.

Things will be faster and cleaner if we simply use pyarrow straight through to pass around intermediate representations.
Passing arrow data through pandas tends to give another chance for typecasting errors to creep in, potentially makes it harder to correct errors involving timezones in date fields, etc.

Nothing especially urgent here, particularly because there are some necessary join operations that can't be handled natively by pyarrow.compute. In nonconsumptive, those are usually handled by polars right now; in this one, it might make more sense to do that relational logic on arrow tables by using duckdb on them, since one nice feature of duck is that it just lets you write SQL on local arrow dataframes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant