Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring TaxData #340

Open
andersonfrailey opened this issue Jul 9, 2020 · 0 comments
Open

Refactoring TaxData #340

andersonfrailey opened this issue Jul 9, 2020 · 0 comments

Comments

@andersonfrailey
Copy link
Collaborator

I've been giving some thought as to how we refactor TaxData and wanted to share what I've come up with here. The general goal is to remove redundant code and set it up to be spun into its own package or as part of a tax data generator app on Compute Studio. Here's my proposed structure with the main directory in bold, subdirectories and files in each directory below

cps_data: everything needed to create tax units from CPS files will be here. Rather than puf_data having it's own set of scripts to make tax units for statistical matching, it will import the needed functions from here.

  • cps_data/data: will contain all of the CPS and C-TAM files used in the
  • all of the files currently in cps_data/pycps will be moved up

statmatch: this is a new one. It'll have all of the code used to run a statistical match, generalized to work with more than just the PUF and CPS. I actually have already written most of this. Code can be found here (could also just be a single file, rather than a whole directory)

puf_data:

  • All of the scripts to prepare the PUF for matching, scripts that call the functions in cps_data to create CPS tax units, run the statistical match, and do all the final prep work.

stage1:
-stage1/data: contains all of the population projections, SOI estimates, CBO projections, etc. used in stage 1 of the extrapolation process.

  • cps_stage1.py, puf_stage1.py. Since there's some overlap in what these files do, it should be possible to boil these down into something more generalized where it's possible to provide alternative inputs for thinks like the CBO projectons

stage2:

  • cps_stage2.py, puf_stage2.py, solve_lp_for_year.py. The last one will be re-written so that both the PUF and CPS file can use the same functions. This would mean moving the PUF to the LP model that the CPS uses. All of the specialized code that's in each individual solve_lp_for_year.py file currently will be moved to the specific stage 2 files.

stage3

  • PUF stage 3 script. Parameterize to take different distributional targets.

All of this is just a rough sketch. Down to change any of it. The general steps to take to get here are

  1. Just move all the files to the new directory structure, but avoid any major changes. After this, all the files we produce should still be exactly the same.
  2. Remove redundant code. Swap to new LP model, use code in cps_data to make all CPS tax units
  3. Generalize as many of the pieces as possible. Move statistical matching to a standalone module, parameterize as many of the inputs as possible.

Parts of the tasks in point 3 could probably be done in conjunction with steps 1 and 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant