You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The tax_microdata_.csv.gz files do not all have the same format. I have found 2 issues:
First, it appears that we still do not have all files run through Tax-Calculator. The screenshot below, from a Windows machine, of the v7 files shows that 2019 and 2020 are much smaller than the other years. Presumably this is because they have not been run through Tax-Calculator and thus do not have the Tax-Calculator output variables appended to the data. Also, oddly, the 2015-2020 files are dated 12/31/1979. This is not a Windows artifact - the same dating scheme can be seen by looking at the files in the online Google Drive folder.
For our immediate needs, I can work around this by excluding the 2019 and 2020 files from my analysis, given that 2015 and 2021 are our focus.
Second, the Tax-Calculator output variables are not in the same position in the 2015 and 2021 files. If I use an R package that reads a list of filenames en masse (vroom), it errors because the 92nd column (after it adds a column for the file name) is not the same in 2015 and 2021, as the screenshot below shows. Further checking shows that the two files have the same variables, but not in the same positions. This is easy enough to work around by reading the files one by one, but it would be desirable to have all files have exactly the same format, including the ordering of the columns.
The text was updated successfully, but these errors were encountered:
@nikhilwoodruff
The tax_microdata_.csv.gz files do not all have the same format. I have found 2 issues:
First, it appears that we still do not have all files run through Tax-Calculator. The screenshot below, from a Windows machine, of the v7 files shows that 2019 and 2020 are much smaller than the other years. Presumably this is because they have not been run through Tax-Calculator and thus do not have the Tax-Calculator output variables appended to the data. Also, oddly, the 2015-2020 files are dated 12/31/1979. This is not a Windows artifact - the same dating scheme can be seen by looking at the files in the online Google Drive folder.
For our immediate needs, I can work around this by excluding the 2019 and 2020 files from my analysis, given that 2015 and 2021 are our focus.
Second, the Tax-Calculator output variables are not in the same position in the 2015 and 2021 files. If I use an R package that reads a list of filenames en masse (vroom), it errors because the 92nd column (after it adds a column for the file name) is not the same in 2015 and 2021, as the screenshot below shows. Further checking shows that the two files have the same variables, but not in the same positions. This is easy enough to work around by reading the files one by one, but it would be desirable to have all files have exactly the same format, including the ordering of the columns.
The text was updated successfully, but these errors were encountered: