Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential broken approach of reading csv files #286

Open
warwickmm opened this issue Apr 3, 2021 · 8 comments
Open

Potential broken approach of reading csv files #286

warwickmm opened this issue Apr 3, 2021 · 8 comments

Comments

@warwickmm
Copy link
Member

warwickmm commented Apr 3, 2021

The following pattern is used often to read csv files:

with open(filename, 'rU') as csvfile:
    reader = unicodecsv.DictReader(csvfile)

I think this worked in python2 since the str and bytes types were synonymous. However, this breaks in python3 since unicodecsv expects the file to be opened in binary mode, which it is not.

For example, the following fails in python3 with the error AttributeError: 'str' object has no attribute 'decode'

import unicodecsv

filename = 'openelex/us/md/mappings/md.csv'
with open(filename, "r") as data:
    reader = unicodecsv.DictReader(data)
    for row in reader:
        print(row)

Using csv instead of unicodecsv fixes the issue.

import csv

filename = 'openelex/us/md/mappings/md.csv'
with open(filename, "r") as data:
    reader = csv.DictReader(data)
    for row in reader:
        print(row)

Is there something wrong with my setup, or is this broken for other people as well?

@warwickmm
Copy link
Member Author

warwickmm commented Apr 3, 2021

FYI, by using csv instead of unicodecsv together with one other small fix, I can get most of the failing tests in test_md_datasource.py to pass. However, I'm not sure if anything else breaks as a result. But given my understanding of how unicodecsv works with python2 vs. python3, it's a bit unclear to me how things are currently working.

@warwickmm
Copy link
Member Author

This seems related to jdunck/python-unicodecsv#65.

@dwillis
Copy link
Contributor

dwillis commented Apr 4, 2021

@warwickmm yeah, this is an artifact of using python2, but we should be using python3, so we can remove unicodecsv and just replace it with the csv module.

@warwickmm
Copy link
Member Author

warwickmm commented Apr 4, 2021

Ok. Do you mind my asking how any of this is working currently? It would seem to me that none of the csv files can be read properly as-is.

@dwillis
Copy link
Contributor

dwillis commented Apr 4, 2021

@warwickmm it's a fair question, and the basic answer is that we've mostly not used the core repo in recent times, instead prioritizing the data conversion work that results in the openelections-data-{state} repos. But we do use it for some of the states and use Python 3 for that.

@warwickmm
Copy link
Member Author

Thanks. If the core repo isn't used very much anymore, is there a different repo that I can look at for possible ways to contribute?
Or, is the core repo still deserving of attention?

@dwillis
Copy link
Contributor

dwillis commented Apr 4, 2021

@warwickmm most of our work now is done in various state-specific repos, where we put converted precinct results. For example, we're working on converting official precinct results for Texas here.

@warwickmm
Copy link
Member Author

Thank you. I'll take a look at the state-specific repos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants