Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add conveniant support for stdin #117

Open
jbdesbas opened this issue Oct 10, 2023 · 2 comments
Open

Add conveniant support for stdin #117

jbdesbas opened this issue Oct 10, 2023 · 2 comments

Comments

@jbdesbas
Copy link
Contributor

Hi,

I just discovered this great project, thanks a lot for this amazing work 馃槂

Since CSV processing usually occurs in data flow process, it would be great to improve conveniency as reading CSV data through stdin.

Writing to stdout is easy already, because sys.stdout is passed directly to csv.writer , but reading is a bit more tricky.

import io
from sys import stdout, stdin

import clevercsv
import chardet

# read
input_data = stdin.buffer.read() # Read as binary
detected_encoding = chardet.detect(input_data)['encoding'] # Guess encoding

csvfile = io.StringIO(input_data.decode(detected_encoding))

dialect = clevercsv.Sniffer().sniff(csvfile.read())
csvfile.seek(0)

reader = clevercsv.reader(csvfile, dialect)
rows = reader


# write
writer = clevercsv.write.writer(sys.stdout, encoding='utf8')
writer.writerows(rows)
@GjjvdBurg
Copy link
Collaborator

Hi @jbdesbas, thanks for the kinds words and for opening this issue! What exactly do you have in mind for the functionality that we can add to CleverCSV to make this easier? A wrapper function perhaps that returns dicts or rows of the CSV file similar to stream_table and stream_dicts (or modification of these to accept sys.stdin)?

Note that the example you shared is very similar to the standardize command in the CLI. If that command is what you're looking for, issue #107 could capture your request too (please let me know).

@jbdesbas
Copy link
Contributor Author

Hi @GjjvdBurg
Yes, I think read/stream table accepting sys.stdin instead of just filename would be a great improvement. 馃憤

My need is sligthy different that the standardize command do : standardize keep the original encoding for the output file, but I need an UTF8 file as output (regardless of orignal encoding). Additionally, my original script do other stuff between reading and writing (add suffix in order to deduplicate columns names).
However, standardize should accept stdin as input too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants