Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New data #217

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

New data #217

wants to merge 10 commits into from

Conversation

rkasher
Copy link

@rkasher rkasher commented Oct 31, 2018

This pull request updates B-Tax to use 2015 data.

@jdebacker jdebacker mentioned this pull request Nov 11, 2018
@jdebacker
Copy link
Member

A summary of working with @rkasher on this today:

  1. The 2015 SOI data for partnerships and sole proprietorships loses lots of industry detail relative to 2014 and earlier.
  2. The 2014 sole proprietorship files (14br01.xls and 14is02.xls) can be read in without issue.
  3. The 2014 partnership file 14pa01.xls can be read in without issue.
  4. The 2014 partnership files 14pa03.xls and 14pa05.xls are having issues being read in. Numbers are being interpreted as strings in some cases.

The likely way to proceed with (4) is to look at the dataframe that is created on line 79 of pull_soi_partner.py using the 14pa03.xls data and see where numbers are being read as strings (e.g., by characters in fields where there should just be numeric values). Once the pattern of what is not being stripped from fields is found, one can update pull_soi_partner.format_excel() to strip or reassign certain characters so that the Excel file is read in properly.

A similar process will then be done for the 14pa05.xls data read in on line 216 of pull_soi_partner.py.

This will take some back and forth to look at patterns in the non-numeric characters in the Excel files and the formatted dataframes.

I will contact SOI and see if there any chance they will carry more industry detail in the future. I will also inquire about the status of the 2014 and 2015 corporate data files.

@jdebacker
Copy link
Member

jdebacker commented Dec 14, 2018

Checklist of data to update (given that we can only go to 2014 due to (1) significant changes in SOI detail in 2015 and the delay in corporate data):

  • Replace 13pa01.xls with 14pa01.xls
  • Replace 13pa03.xls with 14pa03.xls
  • Replace 13pa05.xls with 14pa05.xls
  • Replace 2013sb1.csv with 2013sb1.csv
  • Replace 2013sb3.csv with 2013sb3.csv
  • Make sure pull 2014 column from NIPA_5.8.5B.xls (referenced by _BEA_INV in the code)
  • Make sure pull 2014 column from detailnonres_stk1.xlsx (referenced by _BEA_ASSET_PATH in the code)
  • Make sure pull 2014 column from BEA_StdFixedAsset_Table5.1.xls (referenced by _BEA_RES in the code)
  • Make sure pull 2014 column from b101.csv (referenced by _B101_PATH in the code)

There will be a couple other files to update whenever SOI releases the 2014 corporate data...

@rkasher You may have gotten to the last four too. Can you search the code for places where these files are referenced (you can search the *.py files in \btax for the keys I give for each above) and make sure they are all updated to pull the right data?

After that, I think it's just (1) wait on 2014 corporate data and (2) update tests to pass with new data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants