Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require all tax_microdata_<year>.csv.gz files to have exactly the same format #43

Open
donboyd5 opened this issue Apr 22, 2024 · 0 comments

Comments

@donboyd5
Copy link
Collaborator

@nikhilwoodruff

The tax_microdata_.csv.gz files do not all have the same format. I have found 2 issues:

First, it appears that we still do not have all files run through Tax-Calculator. The screenshot below, from a Windows machine, of the v7 files shows that 2019 and 2020 are much smaller than the other years. Presumably this is because they have not been run through Tax-Calculator and thus do not have the Tax-Calculator output variables appended to the data. Also, oddly, the 2015-2020 files are dated 12/31/1979. This is not a Windows artifact - the same dating scheme can be seen by looking at the files in the online Google Drive folder.

image

For our immediate needs, I can work around this by excluding the 2019 and 2020 files from my analysis, given that 2015 and 2021 are our focus.

Second, the Tax-Calculator output variables are not in the same position in the 2015 and 2021 files. If I use an R package that reads a list of filenames en masse (vroom), it errors because the 92nd column (after it adds a column for the file name) is not the same in 2015 and 2021, as the screenshot below shows. Further checking shows that the two files have the same variables, but not in the same positions. This is easy enough to work around by reading the files one by one, but it would be desirable to have all files have exactly the same format, including the ordering of the columns.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant