Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Keyword Arguments in from_csv Function with use_pyarrow=True for Enhanced CSV Parsing Flexibility #16137

Open
Max0u opened this issue May 9, 2024 · 1 comment
Labels
A-io-csv Area: reading/writing CSV files enhancement New feature or an improvement of an existing feature

Comments

@Max0u
Copy link

Max0u commented May 9, 2024

Description

Summary
The current implementation of from_csv function does not allow kwargs. This feature request proposes enabling the use of kwargs in the context of use_pyarrow=True to enhance functionality and user flexibility, particularly for handling non-standard CSV files with custom delimiters or escape characters.

Motivation
In practice, CSV files do not always conform to a strict standard; many third-party CSV files use non-standard delimiters or escape characters. While using the polar library, I encountered a limitation where I could not adjust the CSV delimiter or escape character in the from_csv function when using PyArrow for parsing. This restriction forced me to use pandas as reader and transfer to polar which cause memory bump.

I would gladly make the PR if this is accepted

@Max0u Max0u added the enhancement New feature or an improvement of an existing feature label May 9, 2024
@alexander-beedie
Copy link
Collaborator

I think if you need to use pyarrow parsing with all of its possible kwargs, you will be better off calling it directly; we want to eventually drop this pass-through completely. (Being able to lean on pyarrow was meant as a temporary convenience for the early days when our CSV parser was significantly less developed than it is now).

But! Could you detail more specifically which features the Polars CSV reader wasn't able to handle in your file? If it's something we can address natively then you wouldn't need "use_pyarrow" at all 👍

@alexander-beedie alexander-beedie added the A-io-csv Area: reading/writing CSV files label May 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io-csv Area: reading/writing CSV files enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

2 participants